CN108008975A - A kind of processing method and processing device of the view data based on KNL platforms - Google Patents
A kind of processing method and processing device of the view data based on KNL platforms Download PDFInfo
- Publication number
- CN108008975A CN108008975A CN201711407553.9A CN201711407553A CN108008975A CN 108008975 A CN108008975 A CN 108008975A CN 201711407553 A CN201711407553 A CN 201711407553A CN 108008975 A CN108008975 A CN 108008975A
- Authority
- CN
- China
- Prior art keywords
- maximum
- absolute value
- view data
- pixel absolute
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000012545 processing Methods 0.000 title claims abstract description 53
- 238000003672 processing method Methods 0.000 title claims abstract description 16
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 63
- 238000000034 method Methods 0.000 claims abstract description 41
- 230000015654 memory Effects 0.000 claims description 47
- 230000006870 function Effects 0.000 claims description 40
- 241001269238 Data Species 0.000 claims description 5
- 230000008569 process Effects 0.000 abstract description 16
- 230000008901 benefit Effects 0.000 description 4
- 239000000872 buffer Substances 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 101100498818 Arabidopsis thaliana DDR4 gene Proteins 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000013139 quantization Methods 0.000 description 2
- 238000003491 array Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 239000012792 core layer Substances 0.000 description 1
- 238000000151 deposition Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 230000005012 migration Effects 0.000 description 1
- 238000013508 migration Methods 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000010010 raising Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
- 238000012876 topography Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3836—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
- G06F9/3851—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution from multiple instruction streams, e.g. multistreaming
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5011—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
- G06F9/5016—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Image Processing (AREA)
Abstract
The embodiment of the invention discloses a kind of processing method and processing device of the view data based on KNL platforms, method includes:Obtain pending view data;The maximum of the pixel absolute value of view data is calculated using multi-threaded parallel algorithm, and determines according to the maximum of the pixel absolute value of the view data calculated the index value of the maximum of the pixel absolute value of view data;The image-region for needing to update according to the index value of the maximum of pixel absolute value calculating view data;Using multi-threaded parallel algorithm, renewal needs the view data of the image-region updated;View data after output renewal.The embodiment of the present invention realizes the multi-threading parallel process of view data, is efficiently modified internal storage access efficiency, greatly improves calculated performance of the algorithm in CPU calculating platforms.
Description
Technical field
The present invention relates to science data processing, more particularly to a kind of processing method of view data based on KNL platforms and
Device.
Background technology
Current International Astronomical project " square kilometer array " astronomical telescope (SKA, Square Kilometer
Array the science data processing (SDP, Science Data Processing) in) is the most critical link of project, and right
The great challenge of human data disposal ability, the astronomical signal collected mainly for the treatment of telescope;Astronomical signal is by " inspection
Measuring is big ", " digitlization ", " correlation ", be eventually converted into astronomic graph and studied for astronomer.The master arrived involved in SDP
Algorithm is wanted to have GRIDDING WITH WEIGHTED AVERAGE ((de) Gridding), deconvolution algorithms ((de) Convolution), fast Fourier transform
(FFT, Fast Fourier Transformation) etc..
Involved in SDP to de Convolution algorithms be in SKA projects science data processing in key algorithm it
One, counted according to SKA design documentations, the calculation amount of deConvolution algorithms accounts for 20% or so of SDP the amount of calculation.SKA projects
Required calculation amount is very huge, and required computing capability is the light in most fast supercomputer Taihu Lake in the world at present
5 times or so.
In correlation technique, deConvolution algorithms operate in many-core processor (KNL, Knights in a serial fashion
Landing on), it the shortcomings that it is as follows:(1) computing resource cannot be efficiently used, time-consuming for program;(2) algorithm operates in CPU meters
Calculate on platform, the shortcomings of memory access efficiency is low, and computing capability is weak.
The content of the invention
In order to solve the above-mentioned technical problem, an embodiment of the present invention provides a kind of place of the view data based on KNL platforms
Method and device is managed, can realize and efficiently use computing resource, program is reduced and takes, improve internal storage access efficiency, improve in CPU
Calculated performance in calculating platform.
In order to reach the object of the invention, the embodiment provides a kind of place of the view data based on KNL platforms
Reason method includes:
Obtain pending view data;
The maximum of the pixel absolute value of the view data is calculated using multi-threaded parallel algorithm, and according to the institute calculated
The maximum of the pixel absolute value of the view data determines the index value of the maximum of the pixel absolute value of the view data;
Calculating the view data according to the index value of the maximum of the pixel absolute value needs the image-region that updates;
Using multi-threaded parallel algorithm, renewal needs the view data of the image-region updated;
View data after output renewal;
Wherein, the index value of the maximum of the pixel absolute value represents the maximum of the pixel absolute value of the view data
Position.
Preferably, this method further includes:
The maximum of the pixel absolute value of the view data is calculated using multi-threaded parallel algorithm to be included:
View data is divided into multiple basic data blocks;
By multiple threads to multiple data block parallel processings, wherein, per thread is respectively adopted findPeak functions and looks into
Look for the maximum of each master data pixel absolute value in the block.
Preferably, this method further includes:
In the maximum using findPeak function lookup master datas pixel absolute values in the block, and according to calculating
Before the maximum of the pixel absolute value of view data determines the index value of maximum of pixel absolute value, two are created in memory
A interim memory headroom;
The maximum of master data pixel absolute value in the block is being found, and according to the pixel of the view data calculated
After the maximum of absolute value determines the index value of the maximum of pixel absolute value, according to the mark of per thread by per thread
The index value of the maximum of the corresponding master data pixel absolute value in the block found and the maximum of pixel absolute value is stored in
In corresponding interim memory headroom, interim array is formed;
Wherein, interim array includes:First array and the second array;First array includes and multiple basic data blocks one
The maximum of one corresponding multiple pixel absolute values, the second array include and multiple basic data blocks multiple pixels correspondingly
The index value of the maximum of absolute value.
Preferably, this method further includes:
Before creating two interim memory headrooms in memory, distributor template is created, wherein, distributor template includes two
A parameter:Data type, the byte-sized of alignment of distribution;
When creating two interim memory headrooms, distributor template is called.
Preferably, this method further includes:
The maximum in the maximum of multiple pixel absolute values is determined from the first array, is defined as the first maximum,
First maximum is the maximum of the pixel absolute value of view data.
Preferably, this method further includes:
First maximum is determined from the first array using serial algorithm.
Preferably, the image-region updated is needed according to the index value of the maximum of pixel absolute value calculating view data,
Including:
Determine that view data needs the image updated according to the corresponding index value of the first maximum and subtractPSF functions
Region;
Wherein, the corresponding index value of the first maximum is the index value of the maximum of the pixel absolute value of view data.
Preferably, this uses multi-threaded parallel algorithm, and the view data for the image-region that renewal needs to update includes:
The view data for the image-region for needing to update is divided into multiple sub-blocks;
By multiple threads to multiple sub-block parallel processings, wherein, subtractPSF letters are respectively adopted in per thread
The view data of the corresponding each sub-block of number renewal.
In addition, to achieve the above object, the invention also provides a kind of processing dress of view data based on KNL platforms
Put, it is characterised in that the device includes:
Acquisition module, for obtaining pending view data;
First processing module, the maximum of the pixel absolute value for calculating the view data using multi-threaded parallel algorithm
Value, and determine according to the maximum of the pixel absolute value of the view data calculated the maximum of the pixel absolute value of view data
The index value of value;
Second processing module, the index value for the maximum according to the pixel absolute value, which calculates the view data, to be needed more
New image-region;
Output module, for exporting the view data after updating;
Wherein, the index value of the maximum of the pixel absolute value of the view data represents the pixel absolute value of the view data
Maximum position.
Preferably, which calculates the pixel absolute value of the view data most using multi-threaded parallel algorithm
Big value includes:
The view data is divided into multiple basic data blocks;
By multiple threads to the plurality of data block parallel processing, wherein, find Peak functions are respectively adopted in per thread
Search the maximum of each master data pixel absolute value in the block.
Preferably, the first processing module, is additionally operable to:
In the maximum using findPeak function lookup master datas pixel absolute values in the block, and according to calculating
Before the maximum of the pixel absolute value of view data determines the index value of maximum of pixel absolute value, two are created in memory
A interim memory headroom;
The maximum of master data pixel absolute value in the block is being found, and according to the picture of the view data calculated
, will be each according to the mark of per thread after the maximum of plain absolute value determines the index value of the maximum of the pixel absolute value
The index of the maximum for the corresponding master data pixel absolute value in the block that thread is found and the maximum of the pixel absolute value
In the corresponding interim memory headroom of value deposit, interim array is formed;
Wherein, which includes:First array and the second array;First array includes and multiple master datas
The maximum of block multiple pixel absolute values correspondingly, second array include more correspondingly with multiple basic data blocks
The index value of the maximum of a pixel absolute value.
Preferably, the first processing units, are additionally operable to:
Before creating two interim memory headrooms in the memory, distributor template is created, wherein, the distributor template bag
Include two parameters:Data type, the byte-sized of alignment of distribution;
When creating two interim memory headrooms, the distributor template is called.
Preferably, the first processing module, is additionally operable to:
The maximum in the maximum of multiple pixel absolute values is determined from first array, it is maximum to be defined as first
Value, first maximum are the maximum of the pixel absolute value of the view data.
Preferably, the first processing module, is additionally operable to:
First maximum is determined from first array using serial algorithm.
Preferably, which calculates the view data need according to the index value of the maximum of the pixel absolute value
The image-region to be updated, including:
Determine that the view data needs what is updated according to the corresponding index value of the first maximum and subtractPSF functions
Image-region;
Wherein, which is the index of the maximum of the pixel absolute value of the view data
Value.
Preferably, which uses multi-threaded parallel algorithm, updates the figure of the image-region of needs renewal
As data include:
The view data for the image-region that the needs update is divided into multiple sub-blocks;
By multiple threads to the plurality of sub-block parallel processing, wherein, this is respectively adopted in per thread
The view data of the corresponding each sub-block of subtractPSF functions renewal.
A kind of computer-readable recording medium, is stored thereon with computer program, which is executed by processor
The processing method of the above-mentioned view data based on KNL platforms of Shi Shixian.
The present invention, which proposes a kind of processing method of the view data based on KNL platforms, to be included:Obtain pending image
Data;The maximum of the pixel absolute value of view data is calculated using multi-threaded parallel algorithm, and according to the picture number calculated
According to pixel absolute value maximum determine the view data pixel absolute value maximum index value;It is absolute according to pixel
The index value of the maximum of value calculates the image-region that view data needs to update;Using multi-threaded parallel algorithm, the need are updated
The view data for the image-region to be updated;View data after output renewal.By the solution of the present invention, solve at present
DeConvolution algorithms exist on CPU platforms calculates the problems such as core is few, memory access efficiency is low, and operation time is slow, realizes
Computing resource is efficiently used, program is reduced and takes, improve internal storage access efficiency, improve the calculated performance in CPU calculating platforms.
Other features and advantages of the present invention will be illustrated in the following description, also, partly becomes from specification
Obtain it is clear that or being understood by implementing the present invention.The purpose of the present invention and other advantages can be by specification, rights
Specifically noted structure is realized and obtained in claim and attached drawing.
Brief description of the drawings
Attached drawing is used for providing further understanding technical solution of the present invention, and a part for constitution instruction, with this
The embodiment of application is used to explain technical scheme together, does not form the limitation to technical solution of the present invention.
Fig. 1 is the flow chart of the processing method of the view data based on KNL platforms of the embodiment of the present invention;
Fig. 2 is findPeak function algorithm flow diagrams;
Fig. 3 is subtractPSF function algorithm flow diagrams;
Fig. 4 is the schematic diagram of the processing unit of the view data based on KNL platforms of the embodiment of the present invention.
Embodiment
For the object, technical solutions and advantages of the present invention are more clearly understood, below in conjunction with attached drawing to the present invention
Embodiment be described in detail.It should be noted that in the case where there is no conflict, in the embodiment and embodiment in the application
Feature can mutually be combined.
Step shown in the flowchart of the accompanying drawings can be in the computer system of such as a group of computer-executable instructions
Perform.Also, although logical order is shown in flow charts, in some cases, can be with suitable different from herein
Sequence performs shown or described step.
Involved in SDP to deConvolution algorithms be in SKA projects science data processing in key algorithm it
The calculating core process of one, deConvolution algorithm, it is as follows:
Explicitly indicated that out from the calculating core process of above-mentioned deConvolution algorithms, calculate core by g_niters times
Iteration forms, and each iteration is performed both by following operation:(1) by findPeak functions find whole image data maximum with
And the maximum of the index (2) of the maximum of view data whole image data that basis searches out in subtractPSF functions
The index of the maximum of value and view data finds out the figure that the view data (3) for needing to update in view data updates needs
As data are updated and export the view data after renewal.
In addition, it was found from the core calculations flow of deConvolution algorithms, the core meter of deConvolution algorithms
Calculate be mainly findPeak functions and subtractPSF functions, specifically in deConvolution algorithms findPeak functions and
SubtractPSF functions are serial computings, and computational efficiency is low, it is impossible to effectively utilize computing resource and carry out thread parallel meter
Calculate, cannot equally efficiently use vectorization processing unit and carry out data parallel.
For these for solving the problems, such as to presently, there are, the present invention is by the method for view data piecemeal, by findPeak letters
Number and subtractPSF functions are improved to parallel computation processing, by the findPeak functions in deConvolution algorithms and
SubtractPSF function algorithms, handle findPeak functions using multi-threaded parallel and calculate and subtractPSF function meters
Calculate, make full use of the advantage of numerous calculating cores on KNL platforms, reach lifting deConvolution algorithm calculated performances
Purpose.
In addition, to make full use of KNL many-core resources, algorithm computational efficiency is improved, using OpenMP (Open Multi-
Processing) multi-threading parallel process findPeak functions and subtractPSF functions calculate, and to avoid opening for thread
Pin, is circulated in outer loop using multi-threading parallel process for and sets #pragma omp parallel for more to open up
Thread performs calculating;Computed repeatedly at the same time for reduction, by interior loop, some, which are computed repeatedly, mentions in outer loop, that is, needs
Will more new image data index lhsIdx and rhsIdx;It is to avoid thread conflict at the same time, it is necessary to be that per thread is arranged to private
Having is i.e. by setting private (lhsIdx, rhsIdx) privately owned " lhsIdx " and " rhsIdx ";The program of specific implementation is such as
Under:
In specific implementation, in interior loop, in order to instruct compiler can be carried out when running into circulation to
Quantization operation, memory circulate anterior addition pre-processing instruction " #pragma simd ", at the same added in option compile "-
XMIC-AVX512 " can so make full use of the 512 vectorization processing units and AVX512 instruction set of KNL, pass through height vector
Changing makes the performance boost of program.
The processing method of the view data based on KNL platforms of the embodiment of the present invention, as shown in Figure 1, including:
Step 100:Obtain pending view data;
In the present embodiment, being obtained based on KNL platforms in deConvolution algorithms needs pending view data.
Step 101:The maximum of the pixel absolute value of view data is calculated using multi-threaded parallel algorithm, and according to calculating
The maximum of the pixel absolute value of the view data gone out determines the index value of the maximum of the pixel absolute value of view data;Its
In, the index value of the maximum of the pixel absolute value represents the position of the maximum of the pixel absolute value of the view data.
In the present embodiment, the maximum of the pixel absolute value of view data, and root are calculated using multi-threaded parallel algorithm
The index of the maximum of the pixel absolute value of view data is determined according to the maximum of the pixel absolute value of the view data calculated
Value.
The maximum of the pixel absolute value of view data is calculated by findPeak functions, it is alternatively possible to be first according to
Whole image data are divided into the basic data block identical with number of threads to complete multi-threaded parallel by executable number of threads
Data processing.
In some optional implementations of the present embodiment, the pixel of view data is calculated using multi-threaded parallel algorithm
The maximum of absolute value, first, is divided into multiple basic data blocks by view data;Optionally, can according to data volume and simultaneously
The quantity of the data block for the division that the number of threads of row computing determines;Such as:After tested, to view data according to KNL platforms
The efficiency highest of 64 threads of machine, therefore, is arranged to 64 thread parallels and performs, divide according to executable number of threads
View data is namely divided into 64 basic data blocks by block, and 64 basic data blocks are handled using 64 thread parallels
View data.
In some optional implementations of the present embodiment, multiple basic data blocks are located parallel by multiple threads
Reason, wherein, the maximum of each master data of findPeak function lookups pixel absolute value in the block is respectively adopted in per thread
And the index value of maximum.
As shown in Fig. 2, by the maximum of each master data of findPeak function lookups pixel absolute value in the block and
The index value flow diagram of maximum, specifically includes following steps:View data is each first in traversal basic data block
A value, the maximum of the pixel absolute value of view data, finds out whole image in multiple basic data blocks by comparing acquisition
The maximum and the corresponding index value of maximum of the pixel absolute value of data, specific implementation step are:
In some optional implementations of the present embodiment, each master data pixel absolute value in the block is being found
Maximum, and determine according to the maximum of the pixel absolute value of the view data calculated the rope of the maximum of pixel absolute value
After drawing value, the corresponding master data pixel absolute value in the block found per thread according to the mark of per thread is most
In the corresponding interim memory headroom of index value deposit of big value and the maximum of pixel absolute value, interim array is formed;Wherein,
Interim array includes:First array and the second array;First array includes multiple correspondingly with multiple basic data blocks
The maximum of pixel absolute value, the second array include the maximum with multiple basic data blocks multiple pixel absolute values correspondingly
The index value of value.
In some optional implementations of the present embodiment, in the block using findPeak function lookup master datas
The maximum of pixel absolute value, and pixel absolute value is determined according to the maximum of the pixel absolute value of the view data calculated
Before the index value of maximum, two interim memory headrooms are created in the memory of the machine of KNL platforms;In establishment two is interim
The purpose of depositing space is the behaviour that one " critical zone " is also just needed to solve to be compared the maximum of multiple basic data blocks
Make, so each thread will can serially perform the code of " critical zone ", but can so cause multiple threads in order to avoid conflict
A critical zone is competed, while there can only be a thread to enter " critical zone " and perform, other threads need to wait, and take and compare
It is long.It is to create two interim memory headrooms in memory in the present embodiment, thus is avoided that the use " critical zone " in multithreading
And the performance of algorithm is restricted, the cross-thread unnecessary stand-by period is eliminated, improves the runnability of algorithm.
In some optional implementations of the present embodiment, the method further includes:
The maximum in the maximum of the pixel absolute value of multiple basic data blocks is determined from the first array, is defined as
First maximum, the first maximum are the maximum of the pixel absolute value of view data.
, will be every after multiple threads are finished by the view data of the multiple basic data blocks of multi-threading parallel process
Maximum and the maximum index that a thread is found are stored in interim array, finally use serial algorithm true from the first array
Make the first maximum.
In some optional implementations of the present embodiment,
First maximum is solved in interim array using serial algorithm, since the size of interim array only has 64 numbers
According to using serial computing, calculating the maximum of the pixel absolute value of view data, the program of specific algorithm is as follows:
In the present embodiment, serial computing can't bring very big expense, conversely according to original mode, although according to
Multi-threaded parallelization processing is so employed, will greatly reduce parallelization efficiency, is determined using serial algorithm from the first array
It is more efficient on the contrary to go out the first maximum.
In some optional implementations of the present embodiment, before creating two interim memory headrooms in memory, wound
Distributor template is built, wherein, distributor template includes two parameters:Data type, the byte-sized of alignment of distribution.In this reality
Apply in example, to accelerate memory access efficiency, KNL is equipped with MCDRA on piece high-speed internal memories, and accessing the speed of MCDRAM can reach
400GB/s~500GB/s, almost accesses 5 to 6 times of common DDR4 memories., will be interim in order to effectively improve memory access efficiency
Array is stored in MCDRAM high-speed internal memories.But the interim memory headrooms of MCDRAM are opened up on KNL to be needed to call specially
API function library, it is therefore desirable to create distributor template, and the distributor template of establishment is transferred when calling.
First, distributor template " template is created<typename T,std::size_t Alignment>class
Aligned_allocater ", wherein, the parameter in distributor template includes two parameters:The data type of distribution, alignment
Byte-sized.First parameter T be the data type of distribution wherein, which can be floating point type, integer type etc.
Deng.Second parameter for alignment byte count sizes for example:After tested, 64 alignment are selected on KNL, are because 64 alignment
It is that effect is best on KNL processors, is conducive to carry out vectorization operation well, improves the efficiency of vectorization.
When calling distributor template to call MCDRAM allocation spaces, the function " hbw_ of MCDRAM allocation spaces is utilized
Posix_memalign () " calls MCDRAM allocation spaces, and the function provided in calling MCDRAM, adds corresponding head text
Part " hbwmalloc.h ";When using distributor template, by " vector<float>" it is changed to " vector<float,
aligned_allocater<float,64>>", and addition-lmemkind the link options in link, realizing will be interim
Array is assigned to MCDRAM high-speed internal memories space, and specific implementation program is as follows:
Step 102:The image-region for needing to update according to the index value of the maximum of pixel absolute value calculating view data;
In the present embodiment, algorithm flow chart is schemed as shown in figure 3, being calculated according to the index value of the maximum of pixel absolute value
As the image-region that data needs update, including:Determined according to the corresponding index value of the first maximum and subtractPSF functions
View data needs the image-region updated;Wherein, the corresponding index value of the first maximum is the pixel absolute value of view data
Maximum index value.First maximum is the maximum of the pixel absolute value of view data.
View data is updated, algorithm flow chart is as shown in Figure 3, it is known that algorithm can be according to the maximum in view data
Value index is calculated, and calculates the image-region for needing to update, the view data in the image-region updated afterwards to needs
It is updated.
Step 103:Using multi-threaded parallel algorithm, renewal needs the view data of the image-region updated.
In the present embodiment, using multi-threaded parallel algorithm, renewal needs the view data bag of the image-region updated
Include:The view data for the image-region for needing to update is divided into multiple sub-blocks first;
Then by multiple threads to multiple sub-block parallel processings, wherein, subtract is respectively adopted in per thread
The view data of the corresponding each sub-block of PSF function renewal, the respective son of the specific concurrent independent access of per thread
Data block, calculates the view data for updating respective sub-block, and final multiple thread process are completed entirely to need the figure updated
As the renewal of the view data in region.
Step 104:View data after output renewal;
In the present embodiment, above-mentioned multiple thread process completions can entirely be needed to the picture number of image-region updated
According to renewal after view data exported, the view data after being updated.
It should be noted that the above is only the specific embodiment of the present invention, it is same as the previously described embodiments or similar
Embodiment, and above-described embodiment variation all within protection scope of the present invention.
In addition, this application provides a kind of one embodiment of the processing unit of the view data based on KNL platforms, the dress
It is corresponding with the embodiment of the method shown in Fig. 1 to put embodiment, which specifically can be applied in various electronic equipments.
As shown in figure 4, the processing unit of the view data based on KNL platforms of the present embodiment includes:Acquisition module, first
Processing module, Second processing module and output module;Wherein, acquisition module is used to obtain pending view data;At first
The maximum that module is used to calculate the pixel absolute value of view data using multi-threaded parallel algorithm is managed, and according to the figure calculated
As the maximum of the pixel absolute value of data determines the index value of the maximum of the pixel absolute value of view data;Wherein, image
The index value of the maximum of the pixel absolute value of data represents the position of the maximum of the pixel absolute value of view data;At second
Reason module is used for the image-region for calculating view data according to the index value of the maximum of pixel absolute value and needing to update;Export mould
Block is used to export the view data after renewal.
Preferably, first processing module calculates the maximum of the pixel absolute value of view data using multi-threaded parallel algorithm
Including:The pending view data that acquisition module obtains is divided into multiple basic data blocks;By multiple threads to more numbers
According to block parallel processing, wherein, it is absolute that each master data of findPeak function lookups pixel in the block is respectively adopted in per thread
The maximum of value.
Preferably, first processing module, is additionally operable to using each master data of findPeak function lookups picture in the block
The maximum of plain absolute value, and pixel absolute value is determined most according to the maximum of the pixel absolute value of the view data calculated
Before the index value being worth greatly, two interim memory headrooms are created in memory;
Preferably, the maximum of master data pixel absolute value in the block is being found, and according to the picture number calculated
According to pixel absolute value maximum determine the index value of maximum of pixel absolute value after, according to per thread mark will
The rope of the maximum for the corresponding master data pixel absolute value in the block that per thread is found and the maximum of pixel absolute value
Draw in the corresponding interim memory headroom of value deposit, form interim array;
Wherein, interim array includes:First array and the second array;First array includes and multiple basic data blocks one
The maximum of one corresponding multiple pixel absolute values, the second array include and multiple basic data blocks multiple pixels correspondingly
The index value of the maximum of absolute value.
Preferably, first processing units, before being additionally operable to create two interim memory headrooms in memory, create distributor
Template, wherein, distributor template includes two parameters:Data type, the byte-sized of alignment of distribution;It is interim when creating two
During memory headroom, distributor template is called.
Preferably, first processing module, in the maximum for being additionally operable to determine multiple pixel absolute values from the first array
Maximum, be defined as the first maximum, the first maximum for the pixel absolute value of view data maximum.
Preferably, first processing module, is additionally operable to determine the first maximum from the first array using serial algorithm.
Preferably, Second processing module calculates view data according to the index value of the maximum of pixel absolute value and needs to update
Image-region, including:View data needs are determined according to the corresponding index value of the first maximum and subtract PSF functions
The image-region of renewal;Wherein, the corresponding index value of the first maximum is the rope of the maximum of the pixel absolute value of view data
Draw value.
Preferably, Second processing module uses multi-threaded parallel algorithm, and renewal needs the picture number of the image-region updated
According to including:The view data for the image-region for needing to update is divided into multiple sub-blocks;By multiple threads to multiple subnumbers
According to block parallel processing, wherein, the figure of the corresponding each sub-block of subtractPSF functions renewal is respectively adopted in per thread
As data.
The treating method and apparatus of view data proposed by the present invention based on KNL platforms:By splitting to view data,
Using multi-threading parallel process, solves the problems, such as serial process image;Some extra buffers are wherein also distributed, processing is big
The problem of array, is converted into the problem of handling small array, solves the problems, such as that cross-thread waits during direct solution big array.The journey
Sequence also uses the efficiency of memory MCDRAM raisings memory access on high-speed chip on KNL, furthermore with VPU numerous on KNL
(Vector Processing Unit) unit, program added in interior loop compiling instruct sentence instruct compiler carry out to
Quantization operation.Specific implementation method of the present invention includes:(1) deConvolution algorithms (2) fortune is implemented using KNL calculating platforms
Handled with multi-threaded parallelization;(3) opening up for extra buffer waits to avoid cross-thread;(4) point of MCDRAM high-speed internal memories
Match somebody with somebody;(5) core layers are recycled with AVX-512 vectorizations.Wherein:
(1) deConvolution algorithms are implemented using KNL calculating platforms:KNL (Knights Landing) is Intel
Two generation Xeon Phi many-core processors, can at most carry 72 general cores of CPU, and each core can support 4 threads concurrently to hold
OK, maximum can support 288 threads concurrently to perform.With first generation Xeon Phi many-core processors KNC (Knights Corner)
Compare, the characteristics of it is maximum is that KNL supports master processor mode, and is no longer that coprocessor assists CPU to calculate.Primary processor mould
Formula greatly improves portability and convenience of the program in KNL platform programs, while avoids data by PCI-E in master
Deposit with the process copied back and forth on accelerator, eliminate the time-consuming of Data Migration copy, this be also KNL compared to other associations at
Manage the characteristics of device is maximum.
(2) handled with OpenMP multi-threaded parallelizations:OpenMP (Open Multi-Processing) multi-threaded parallel
Change processing to be mainly used at two, first is in the maximum for the parts of images split in multithreading block research per thread
With the index of maximum;Second is in the partial image data that respective thread segmentation is updated in OpenMP multi-threading parallel process,
So as to complete the renewal of whole image data.
(3) opening up for extra buffer waits to avoid cross-thread:Essentially consist in the searching to image maximum, first
Apply for two interim arrays (array deposits maximum, and an array deposits the index of maximum), the size of array is
The number of OpenMP threads;Apply for per thread the privately owned local maximum of a thread and local maximum index, this
The partial image data that sample per thread splits respective thread is searched, the topography's maximum and Qi Suo that will be found
Draw in the interim array of deposit;After all OpenMP thread process, serial searches maximum and maximum at zero in array
Value index.Temporary buffer is so opened up, can avoid directly searching the line of maximum with multiple threads complete image data
The time that journey waits.
(4) distribution of MCDRAM high-speed internal memories:MCDRAM (Multi-Channel DRAM) is the on piece on KNL processors
High-speed internal memory, its bandwidth are more than 4 times of common DDR4 bandwidth, and capacity can reach more than 16GB, and MCDRAM can be matched somebody with somebody by BIOS
Be set to third level caching, an independent NUMA node and the two mix these three patterns, use independent NUMA in the present invention
Node mode distributes critical data in MCDRAM caches, improves the efficiency of memory access.
(5) the automatic vectorizations of AVX-512 are utilized:Compiler is once loaded using SIMD technologies, handles multi-group data, and
The upper distinctive 512 bit vector processing units of KNL and AVX-512 instruction set can make compiler disposably handle 16 single precisions to float
Points, very big faster procedure calculated performance.
It should be noted that herein, term " comprising ", "comprising" or its any other variant are intended to non-row
His property includes, so that process, method, article or device including a series of elements not only include those key elements, and
And other elements that are not explicitly listed are further included, or further include as this process, method, article or device institute inherently
Key element.In the absence of more restrictions, the key element limited by sentence "including a ...", it is not excluded that including this
Also there are other identical element in the process of key element, method, article or device.
Embodiments of the present invention sequencing is for illustration only, does not represent the quality of embodiment.It is any with it is of the invention
The same or similar scheme of mentality of designing, and the change of scheme the same or similar with the embodiment of the present invention and the embodiment of the present invention
Body is all within protection scope of the present invention.
Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment side
Method can add the mode of required general hardware platform to realize by software, naturally it is also possible to by hardware, but in many cases
The former is more preferably embodiment.Based on such understanding, technical scheme substantially in other words does the prior art
Going out the part of contribution can be embodied in the form of software product, which is stored in a storage medium
In (such as ROM/RAM, magnetic disc, CD), including some instructions are used so that a station terminal equipment (can be mobile phone, computer, takes
Be engaged in device, air conditioner, or network equipment etc.) perform method described in each embodiment of the present invention.
It these are only the preferred embodiment of the present invention, be not intended to limit the scope of the invention, it is every to utilize this hair
The equivalent structure or equivalent flow shift that bright specification and accompanying drawing content are made, is directly or indirectly used in other relevant skills
Art field, is included within the scope of the present invention.
Claims (10)
- A kind of 1. processing method of the view data based on KNL platforms, it is characterised in that the described method includes:Obtain pending view data;The maximum of the pixel absolute value of described image data is calculated using multi-threaded parallel algorithm, and according to calculating The maximum of the pixel absolute value of view data determines the index value of the maximum of the pixel absolute value of described image data;The image-region for needing to update according to the index value of the maximum of pixel absolute value calculating described image data;Using the multi-threaded parallel algorithm, the view data of the renewal image-region for needing to update;View data after output renewal;Wherein, the index value of the maximum of the pixel absolute value represents the maximum of the pixel absolute value of described image data Position.
- 2. the processing method of the view data according to claim 1 based on KNL platforms, it is characterised in that the use The maximum that multi-threaded parallel algorithm calculates the pixel absolute value of described image data includes:Described image data are divided into multiple basic data blocks;By multiple threads to the multiple data block parallel processing, wherein, per thread is respectively adopted findPeak functions and looks into Look for the maximum of each master data pixel absolute value in the block.
- 3. the processing method of the view data according to claim 2 based on KNL platforms, it is characterised in that the method Further include:In the maximum using master data pixel absolute value in the block described in findPeak function lookups, and according to calculating Before the maximum of the pixel absolute value of described image data determines the index value of the maximum of the pixel absolute value, in memory Two interim memory headrooms of middle establishment;The maximum of master data pixel absolute value in the block is being found, and according to the described image data calculated , will according to the mark of per thread after the maximum of pixel absolute value determines the index value of the maximum of the pixel absolute value The maximum for the corresponding master data pixel absolute value in the block that the per thread is found and the pixel absolute value are most In the corresponding interim memory headroom of index value deposit being worth greatly, interim array is formed;Wherein, the interim array includes:First array and the second array;First array includes and multiple master datas The maximum of block multiple pixel absolute values correspondingly, second array includes and the multiple basic data block one is a pair of The index value of the maximum for the multiple pixel absolute values answered.
- 4. the processing method of the view data according to claim 3 based on KNL platforms, it is characterised in that the method Further include:Before creating two interim memory headrooms in the memory, distributor template is created, wherein, the distributor template bag Include two parameters:Data type, the byte-sized of alignment of distribution;When creating described two interim memory headrooms, the distributor template is called.
- 5. the processing method of the view data according to claim 2 based on KNL platforms, it is characterised in that the method Further include:The maximum in the maximum of multiple pixel absolute values is determined from first array, it is maximum to be defined as first Value, first maximum are the maximum of the pixel absolute value of described image data.
- 6. the processing method of the view data according to claim 5 based on KNL platforms, it is characterised in that the method Further include:First maximum is determined from first array using serial algorithm.
- 7. the processing method of the view data according to claim 1 based on KNL platforms, it is characterised in that the basis The index value of the maximum of the pixel absolute value calculates the image-region that described image data need to update, including:Determine that described image data need what is updated according to the corresponding index value of first maximum and subtractPSF functions Image-region;Wherein, the corresponding index value of first maximum is the index of the maximum of the pixel absolute value of described image data Value.
- 8. the processing method of the view data according to claim 7 based on KNL platforms, it is characterised in that the use Multi-threaded parallel algorithm, updating the view data of the image-region for needing to update includes:The view data of the image-region for needing to update is divided into multiple sub-blocks;By multiple threads to the multiple sub-block parallel processing, wherein, per thread is respectively adopted described The view data of the corresponding each sub-block of subtractPSF functions renewal.
- 9. a kind of processing unit of the view data based on KNL platforms, it is characterised in that described device includes:Acquisition module, for obtaining pending view data;First processing module, the maximum of the pixel absolute value for calculating described image data using multi-threaded parallel algorithm, And the pixel absolute value of described image data is determined according to the maximum of the pixel absolute value of the described image data calculated The index value of maximum;Second processing module, the index value for the maximum according to the pixel absolute value, which calculates described image data, to be needed more New image-region;Output module, for exporting the view data after updating;Wherein, the index value of the maximum of the pixel absolute value of described image data represents the pixel absolute value of described image data Maximum position.
- 10. the processing unit of the view data according to claim 9 based on KNL platforms, it is characterised in that described first The maximum that processing module calculates the pixel absolute value of described image data using multi-threaded parallel algorithm includes:Described image data are divided into multiple basic data blocks;By multiple threads to the multiple data block parallel processing, wherein, per thread is respectively adopted findPeak functions and looks into Look for the maximum of each master data pixel absolute value in the block.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711407553.9A CN108008975A (en) | 2017-12-22 | 2017-12-22 | A kind of processing method and processing device of the view data based on KNL platforms |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711407553.9A CN108008975A (en) | 2017-12-22 | 2017-12-22 | A kind of processing method and processing device of the view data based on KNL platforms |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108008975A true CN108008975A (en) | 2018-05-08 |
Family
ID=62060735
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711407553.9A Pending CN108008975A (en) | 2017-12-22 | 2017-12-22 | A kind of processing method and processing device of the view data based on KNL platforms |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108008975A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108874547A (en) * | 2018-06-27 | 2018-11-23 | 郑州云海信息技术有限公司 | A kind of data processing method and device of astronomy software Gridding |
CN109062636A (en) * | 2018-07-20 | 2018-12-21 | 浪潮(北京)电子信息产业有限公司 | A kind of data processing method, device, equipment and medium |
CN110879744A (en) * | 2018-09-06 | 2020-03-13 | 第四范式(北京)技术有限公司 | Method and system for executing computation graph by multiple threads |
CN111429413A (en) * | 2020-03-18 | 2020-07-17 | 中国建设银行股份有限公司 | Image segmentation method and device and computer readable storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102044071A (en) * | 2010-12-28 | 2011-05-04 | 上海大学 | Single-pixel margin detection method based on FPGA |
US8213518B1 (en) * | 2006-10-31 | 2012-07-03 | Sony Computer Entertainment Inc. | Multi-threaded streaming data decoding |
US8531725B2 (en) * | 2010-06-08 | 2013-09-10 | Canon Kabushiki Kaisha | Rastering disjoint regions of the page in parallel |
CN106910157A (en) * | 2017-02-17 | 2017-06-30 | 公安部第研究所 | The image rebuilding method and device of a kind of multistage parallel |
-
2017
- 2017-12-22 CN CN201711407553.9A patent/CN108008975A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8213518B1 (en) * | 2006-10-31 | 2012-07-03 | Sony Computer Entertainment Inc. | Multi-threaded streaming data decoding |
US8531725B2 (en) * | 2010-06-08 | 2013-09-10 | Canon Kabushiki Kaisha | Rastering disjoint regions of the page in parallel |
CN102044071A (en) * | 2010-12-28 | 2011-05-04 | 上海大学 | Single-pixel margin detection method based on FPGA |
CN106910157A (en) * | 2017-02-17 | 2017-06-30 | 公安部第研究所 | The image rebuilding method and device of a kind of multistage parallel |
Non-Patent Citations (1)
Title |
---|
侯天娇等: "一种基于FPGA的彩色图像实时增强方法", 《液晶与显示》 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108874547A (en) * | 2018-06-27 | 2018-11-23 | 郑州云海信息技术有限公司 | A kind of data processing method and device of astronomy software Gridding |
CN109062636A (en) * | 2018-07-20 | 2018-12-21 | 浪潮(北京)电子信息产业有限公司 | A kind of data processing method, device, equipment and medium |
CN110879744A (en) * | 2018-09-06 | 2020-03-13 | 第四范式(北京)技术有限公司 | Method and system for executing computation graph by multiple threads |
CN110879744B (en) * | 2018-09-06 | 2022-08-16 | 第四范式(北京)技术有限公司 | Method and system for executing computation graph by multiple threads |
CN111429413A (en) * | 2020-03-18 | 2020-07-17 | 中国建设银行股份有限公司 | Image segmentation method and device and computer readable storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108008975A (en) | A kind of processing method and processing device of the view data based on KNL platforms | |
Ragan-Kelley et al. | Decoupling algorithms from schedules for easy optimization of image processing pipelines | |
CN104050632B (en) | Method and system for the processing of multisample pixel data | |
CN106095588B (en) | CDVS extraction process accelerated method based on GPGPU platform | |
CN110135569B (en) | Heterogeneous platform neuron positioning three-level flow parallel method, system and medium | |
Bottleson et al. | clcaffe: Opencl accelerated caffe for convolutional neural networks | |
CN104992421B (en) | A kind of parallel optimization method of the Image denoising algorithm based on OpenCL | |
Tanaka et al. | Automatic graph partitioning for very large-scale deep learning | |
de Oliveira et al. | Partitioning convolutional neural networks for inference on constrained Internet-of-Things devices | |
Jeon et al. | Parallel exact inference on a CPU-GPGPU heterogenous system | |
CN109408867B (en) | Explicit R-K time propulsion acceleration method based on MIC coprocessor | |
Levchenko et al. | GPU implementation of ConeTorre algorithm for fluid dynamics simulation | |
CN106484532B (en) | GPGPU parallel calculating method towards SPH fluid simulation | |
CN113655986B9 (en) | FFT convolution algorithm parallel implementation method and system based on NUMA affinity | |
Jansson | Spectral Element simulations on the NEC SX-Aurora TSUBASA | |
Quesada-Barriuso et al. | Efficient GPU asynchronous implementation of a watershed algorithm based on cellular automata | |
US11960982B1 (en) | System and method of determining and executing deep tensor columns in neural networks | |
Willis et al. | An efficient SIMD implementation of pseudo-Verlet lists for neighbour interactions in particle-based codes | |
Zhu et al. | A parallel non-local means denoising algorithm implementation with openmp and opencl on intel xeon phi coprocessor | |
Liu et al. | Parallelizing convolutional neural networks on intel many integrated core architecture | |
CN106598552A (en) | Data point conversion method and device based on Gridding module | |
Kim et al. | Optimizing seam carving on multi-GPU systems for real-time content-aware image resizing | |
Bederián et al. | Boosting quantum evolutions using Trotter-Suzuki algorithms on GPUs | |
Selgrad et al. | A High-Performance Image Processing DSL for Heterogeneous Architectures. | |
Gutierrez et al. | A fast level-set segmentation algorithm for image processing designed for parallel architectures |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20190222 Address after: 100085 Beijing Haidian District Shangdi Information Road 2-1 C Building 1 Floor Applicant after: INSPUR (BEIJING) ELECTRONIC INFORMATION INDUSTRY Co.,Ltd. Address before: Room 1601, floor 16, 278 Xinyi Road, Zhengdong New District, Zhengzhou City, Henan Province Applicant before: ZHENGZHOU YUNHAI INFORMATION TECHNOLOGY Co.,Ltd. |
|
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180508 |