CN108008975A - A kind of processing method and processing device of the view data based on KNL platforms - Google Patents

A kind of processing method and processing device of the view data based on KNL platforms Download PDF

Info

Publication number
CN108008975A
CN108008975A CN201711407553.9A CN201711407553A CN108008975A CN 108008975 A CN108008975 A CN 108008975A CN 201711407553 A CN201711407553 A CN 201711407553A CN 108008975 A CN108008975 A CN 108008975A
Authority
CN
China
Prior art keywords
maximum
absolute value
view data
pixel absolute
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201711407553.9A
Other languages
Chinese (zh)
Inventor
黄雪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Beijing Electronic Information Industry Co Ltd
Original Assignee
Zhengzhou Yunhai Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhengzhou Yunhai Information Technology Co Ltd filed Critical Zhengzhou Yunhai Information Technology Co Ltd
Priority to CN201711407553.9A priority Critical patent/CN108008975A/en
Publication of CN108008975A publication Critical patent/CN108008975A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • G06F9/3851Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution from multiple instruction streams, e.g. multistreaming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5016Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Image Processing (AREA)

Abstract

The embodiment of the invention discloses a kind of processing method and processing device of the view data based on KNL platforms, method includes:Obtain pending view data;The maximum of the pixel absolute value of view data is calculated using multi-threaded parallel algorithm, and determines according to the maximum of the pixel absolute value of the view data calculated the index value of the maximum of the pixel absolute value of view data;The image-region for needing to update according to the index value of the maximum of pixel absolute value calculating view data;Using multi-threaded parallel algorithm, renewal needs the view data of the image-region updated;View data after output renewal.The embodiment of the present invention realizes the multi-threading parallel process of view data, is efficiently modified internal storage access efficiency, greatly improves calculated performance of the algorithm in CPU calculating platforms.

Description

A kind of processing method and processing device of the view data based on KNL platforms
Technical field
The present invention relates to science data processing, more particularly to a kind of processing method of view data based on KNL platforms and Device.
Background technology
Current International Astronomical project " square kilometer array " astronomical telescope (SKA, Square Kilometer Array the science data processing (SDP, Science Data Processing) in) is the most critical link of project, and right The great challenge of human data disposal ability, the astronomical signal collected mainly for the treatment of telescope;Astronomical signal is by " inspection Measuring is big ", " digitlization ", " correlation ", be eventually converted into astronomic graph and studied for astronomer.The master arrived involved in SDP Algorithm is wanted to have GRIDDING WITH WEIGHTED AVERAGE ((de) Gridding), deconvolution algorithms ((de) Convolution), fast Fourier transform (FFT, Fast Fourier Transformation) etc..
Involved in SDP to de Convolution algorithms be in SKA projects science data processing in key algorithm it One, counted according to SKA design documentations, the calculation amount of deConvolution algorithms accounts for 20% or so of SDP the amount of calculation.SKA projects Required calculation amount is very huge, and required computing capability is the light in most fast supercomputer Taihu Lake in the world at present 5 times or so.
In correlation technique, deConvolution algorithms operate in many-core processor (KNL, Knights in a serial fashion Landing on), it the shortcomings that it is as follows:(1) computing resource cannot be efficiently used, time-consuming for program;(2) algorithm operates in CPU meters Calculate on platform, the shortcomings of memory access efficiency is low, and computing capability is weak.
The content of the invention
In order to solve the above-mentioned technical problem, an embodiment of the present invention provides a kind of place of the view data based on KNL platforms Method and device is managed, can realize and efficiently use computing resource, program is reduced and takes, improve internal storage access efficiency, improve in CPU Calculated performance in calculating platform.
In order to reach the object of the invention, the embodiment provides a kind of place of the view data based on KNL platforms Reason method includes:
Obtain pending view data;
The maximum of the pixel absolute value of the view data is calculated using multi-threaded parallel algorithm, and according to the institute calculated The maximum of the pixel absolute value of the view data determines the index value of the maximum of the pixel absolute value of the view data;
Calculating the view data according to the index value of the maximum of the pixel absolute value needs the image-region that updates;
Using multi-threaded parallel algorithm, renewal needs the view data of the image-region updated;
View data after output renewal;
Wherein, the index value of the maximum of the pixel absolute value represents the maximum of the pixel absolute value of the view data Position.
Preferably, this method further includes:
The maximum of the pixel absolute value of the view data is calculated using multi-threaded parallel algorithm to be included:
View data is divided into multiple basic data blocks;
By multiple threads to multiple data block parallel processings, wherein, per thread is respectively adopted findPeak functions and looks into Look for the maximum of each master data pixel absolute value in the block.
Preferably, this method further includes:
In the maximum using findPeak function lookup master datas pixel absolute values in the block, and according to calculating Before the maximum of the pixel absolute value of view data determines the index value of maximum of pixel absolute value, two are created in memory A interim memory headroom;
The maximum of master data pixel absolute value in the block is being found, and according to the pixel of the view data calculated After the maximum of absolute value determines the index value of the maximum of pixel absolute value, according to the mark of per thread by per thread The index value of the maximum of the corresponding master data pixel absolute value in the block found and the maximum of pixel absolute value is stored in In corresponding interim memory headroom, interim array is formed;
Wherein, interim array includes:First array and the second array;First array includes and multiple basic data blocks one The maximum of one corresponding multiple pixel absolute values, the second array include and multiple basic data blocks multiple pixels correspondingly The index value of the maximum of absolute value.
Preferably, this method further includes:
Before creating two interim memory headrooms in memory, distributor template is created, wherein, distributor template includes two A parameter:Data type, the byte-sized of alignment of distribution;
When creating two interim memory headrooms, distributor template is called.
Preferably, this method further includes:
The maximum in the maximum of multiple pixel absolute values is determined from the first array, is defined as the first maximum, First maximum is the maximum of the pixel absolute value of view data.
Preferably, this method further includes:
First maximum is determined from the first array using serial algorithm.
Preferably, the image-region updated is needed according to the index value of the maximum of pixel absolute value calculating view data, Including:
Determine that view data needs the image updated according to the corresponding index value of the first maximum and subtractPSF functions Region;
Wherein, the corresponding index value of the first maximum is the index value of the maximum of the pixel absolute value of view data.
Preferably, this uses multi-threaded parallel algorithm, and the view data for the image-region that renewal needs to update includes:
The view data for the image-region for needing to update is divided into multiple sub-blocks;
By multiple threads to multiple sub-block parallel processings, wherein, subtractPSF letters are respectively adopted in per thread The view data of the corresponding each sub-block of number renewal.
In addition, to achieve the above object, the invention also provides a kind of processing dress of view data based on KNL platforms Put, it is characterised in that the device includes:
Acquisition module, for obtaining pending view data;
First processing module, the maximum of the pixel absolute value for calculating the view data using multi-threaded parallel algorithm Value, and determine according to the maximum of the pixel absolute value of the view data calculated the maximum of the pixel absolute value of view data The index value of value;
Second processing module, the index value for the maximum according to the pixel absolute value, which calculates the view data, to be needed more New image-region;
Output module, for exporting the view data after updating;
Wherein, the index value of the maximum of the pixel absolute value of the view data represents the pixel absolute value of the view data Maximum position.
Preferably, which calculates the pixel absolute value of the view data most using multi-threaded parallel algorithm Big value includes:
The view data is divided into multiple basic data blocks;
By multiple threads to the plurality of data block parallel processing, wherein, find Peak functions are respectively adopted in per thread Search the maximum of each master data pixel absolute value in the block.
Preferably, the first processing module, is additionally operable to:
In the maximum using findPeak function lookup master datas pixel absolute values in the block, and according to calculating Before the maximum of the pixel absolute value of view data determines the index value of maximum of pixel absolute value, two are created in memory A interim memory headroom;
The maximum of master data pixel absolute value in the block is being found, and according to the picture of the view data calculated , will be each according to the mark of per thread after the maximum of plain absolute value determines the index value of the maximum of the pixel absolute value The index of the maximum for the corresponding master data pixel absolute value in the block that thread is found and the maximum of the pixel absolute value In the corresponding interim memory headroom of value deposit, interim array is formed;
Wherein, which includes:First array and the second array;First array includes and multiple master datas The maximum of block multiple pixel absolute values correspondingly, second array include more correspondingly with multiple basic data blocks The index value of the maximum of a pixel absolute value.
Preferably, the first processing units, are additionally operable to:
Before creating two interim memory headrooms in the memory, distributor template is created, wherein, the distributor template bag Include two parameters:Data type, the byte-sized of alignment of distribution;
When creating two interim memory headrooms, the distributor template is called.
Preferably, the first processing module, is additionally operable to:
The maximum in the maximum of multiple pixel absolute values is determined from first array, it is maximum to be defined as first Value, first maximum are the maximum of the pixel absolute value of the view data.
Preferably, the first processing module, is additionally operable to:
First maximum is determined from first array using serial algorithm.
Preferably, which calculates the view data need according to the index value of the maximum of the pixel absolute value The image-region to be updated, including:
Determine that the view data needs what is updated according to the corresponding index value of the first maximum and subtractPSF functions Image-region;
Wherein, which is the index of the maximum of the pixel absolute value of the view data Value.
Preferably, which uses multi-threaded parallel algorithm, updates the figure of the image-region of needs renewal As data include:
The view data for the image-region that the needs update is divided into multiple sub-blocks;
By multiple threads to the plurality of sub-block parallel processing, wherein, this is respectively adopted in per thread The view data of the corresponding each sub-block of subtractPSF functions renewal.
A kind of computer-readable recording medium, is stored thereon with computer program, which is executed by processor The processing method of the above-mentioned view data based on KNL platforms of Shi Shixian.
The present invention, which proposes a kind of processing method of the view data based on KNL platforms, to be included:Obtain pending image Data;The maximum of the pixel absolute value of view data is calculated using multi-threaded parallel algorithm, and according to the picture number calculated According to pixel absolute value maximum determine the view data pixel absolute value maximum index value;It is absolute according to pixel The index value of the maximum of value calculates the image-region that view data needs to update;Using multi-threaded parallel algorithm, the need are updated The view data for the image-region to be updated;View data after output renewal.By the solution of the present invention, solve at present DeConvolution algorithms exist on CPU platforms calculates the problems such as core is few, memory access efficiency is low, and operation time is slow, realizes Computing resource is efficiently used, program is reduced and takes, improve internal storage access efficiency, improve the calculated performance in CPU calculating platforms.
Other features and advantages of the present invention will be illustrated in the following description, also, partly becomes from specification Obtain it is clear that or being understood by implementing the present invention.The purpose of the present invention and other advantages can be by specification, rights Specifically noted structure is realized and obtained in claim and attached drawing.
Brief description of the drawings
Attached drawing is used for providing further understanding technical solution of the present invention, and a part for constitution instruction, with this The embodiment of application is used to explain technical scheme together, does not form the limitation to technical solution of the present invention.
Fig. 1 is the flow chart of the processing method of the view data based on KNL platforms of the embodiment of the present invention;
Fig. 2 is findPeak function algorithm flow diagrams;
Fig. 3 is subtractPSF function algorithm flow diagrams;
Fig. 4 is the schematic diagram of the processing unit of the view data based on KNL platforms of the embodiment of the present invention.
Embodiment
For the object, technical solutions and advantages of the present invention are more clearly understood, below in conjunction with attached drawing to the present invention Embodiment be described in detail.It should be noted that in the case where there is no conflict, in the embodiment and embodiment in the application Feature can mutually be combined.
Step shown in the flowchart of the accompanying drawings can be in the computer system of such as a group of computer-executable instructions Perform.Also, although logical order is shown in flow charts, in some cases, can be with suitable different from herein Sequence performs shown or described step.
Involved in SDP to deConvolution algorithms be in SKA projects science data processing in key algorithm it The calculating core process of one, deConvolution algorithm, it is as follows:
Explicitly indicated that out from the calculating core process of above-mentioned deConvolution algorithms, calculate core by g_niters times Iteration forms, and each iteration is performed both by following operation:(1) by findPeak functions find whole image data maximum with And the maximum of the index (2) of the maximum of view data whole image data that basis searches out in subtractPSF functions The index of the maximum of value and view data finds out the figure that the view data (3) for needing to update in view data updates needs As data are updated and export the view data after renewal.
In addition, it was found from the core calculations flow of deConvolution algorithms, the core meter of deConvolution algorithms Calculate be mainly findPeak functions and subtractPSF functions, specifically in deConvolution algorithms findPeak functions and SubtractPSF functions are serial computings, and computational efficiency is low, it is impossible to effectively utilize computing resource and carry out thread parallel meter Calculate, cannot equally efficiently use vectorization processing unit and carry out data parallel.
For these for solving the problems, such as to presently, there are, the present invention is by the method for view data piecemeal, by findPeak letters Number and subtractPSF functions are improved to parallel computation processing, by the findPeak functions in deConvolution algorithms and SubtractPSF function algorithms, handle findPeak functions using multi-threaded parallel and calculate and subtractPSF function meters Calculate, make full use of the advantage of numerous calculating cores on KNL platforms, reach lifting deConvolution algorithm calculated performances Purpose.
In addition, to make full use of KNL many-core resources, algorithm computational efficiency is improved, using OpenMP (Open Multi- Processing) multi-threading parallel process findPeak functions and subtractPSF functions calculate, and to avoid opening for thread Pin, is circulated in outer loop using multi-threading parallel process for and sets #pragma omp parallel for more to open up Thread performs calculating;Computed repeatedly at the same time for reduction, by interior loop, some, which are computed repeatedly, mentions in outer loop, that is, needs Will more new image data index lhsIdx and rhsIdx;It is to avoid thread conflict at the same time, it is necessary to be that per thread is arranged to private Having is i.e. by setting private (lhsIdx, rhsIdx) privately owned " lhsIdx " and " rhsIdx ";The program of specific implementation is such as Under:
In specific implementation, in interior loop, in order to instruct compiler can be carried out when running into circulation to Quantization operation, memory circulate anterior addition pre-processing instruction " #pragma simd ", at the same added in option compile "- XMIC-AVX512 " can so make full use of the 512 vectorization processing units and AVX512 instruction set of KNL, pass through height vector Changing makes the performance boost of program.
The processing method of the view data based on KNL platforms of the embodiment of the present invention, as shown in Figure 1, including:
Step 100:Obtain pending view data;
In the present embodiment, being obtained based on KNL platforms in deConvolution algorithms needs pending view data.
Step 101:The maximum of the pixel absolute value of view data is calculated using multi-threaded parallel algorithm, and according to calculating The maximum of the pixel absolute value of the view data gone out determines the index value of the maximum of the pixel absolute value of view data;Its In, the index value of the maximum of the pixel absolute value represents the position of the maximum of the pixel absolute value of the view data.
In the present embodiment, the maximum of the pixel absolute value of view data, and root are calculated using multi-threaded parallel algorithm The index of the maximum of the pixel absolute value of view data is determined according to the maximum of the pixel absolute value of the view data calculated Value.
The maximum of the pixel absolute value of view data is calculated by findPeak functions, it is alternatively possible to be first according to Whole image data are divided into the basic data block identical with number of threads to complete multi-threaded parallel by executable number of threads Data processing.
In some optional implementations of the present embodiment, the pixel of view data is calculated using multi-threaded parallel algorithm The maximum of absolute value, first, is divided into multiple basic data blocks by view data;Optionally, can according to data volume and simultaneously The quantity of the data block for the division that the number of threads of row computing determines;Such as:After tested, to view data according to KNL platforms The efficiency highest of 64 threads of machine, therefore, is arranged to 64 thread parallels and performs, divide according to executable number of threads View data is namely divided into 64 basic data blocks by block, and 64 basic data blocks are handled using 64 thread parallels View data.
In some optional implementations of the present embodiment, multiple basic data blocks are located parallel by multiple threads Reason, wherein, the maximum of each master data of findPeak function lookups pixel absolute value in the block is respectively adopted in per thread And the index value of maximum.
As shown in Fig. 2, by the maximum of each master data of findPeak function lookups pixel absolute value in the block and The index value flow diagram of maximum, specifically includes following steps:View data is each first in traversal basic data block A value, the maximum of the pixel absolute value of view data, finds out whole image in multiple basic data blocks by comparing acquisition The maximum and the corresponding index value of maximum of the pixel absolute value of data, specific implementation step are:
In some optional implementations of the present embodiment, each master data pixel absolute value in the block is being found Maximum, and determine according to the maximum of the pixel absolute value of the view data calculated the rope of the maximum of pixel absolute value After drawing value, the corresponding master data pixel absolute value in the block found per thread according to the mark of per thread is most In the corresponding interim memory headroom of index value deposit of big value and the maximum of pixel absolute value, interim array is formed;Wherein, Interim array includes:First array and the second array;First array includes multiple correspondingly with multiple basic data blocks The maximum of pixel absolute value, the second array include the maximum with multiple basic data blocks multiple pixel absolute values correspondingly The index value of value.
In some optional implementations of the present embodiment, in the block using findPeak function lookup master datas The maximum of pixel absolute value, and pixel absolute value is determined according to the maximum of the pixel absolute value of the view data calculated Before the index value of maximum, two interim memory headrooms are created in the memory of the machine of KNL platforms;In establishment two is interim The purpose of depositing space is the behaviour that one " critical zone " is also just needed to solve to be compared the maximum of multiple basic data blocks Make, so each thread will can serially perform the code of " critical zone ", but can so cause multiple threads in order to avoid conflict A critical zone is competed, while there can only be a thread to enter " critical zone " and perform, other threads need to wait, and take and compare It is long.It is to create two interim memory headrooms in memory in the present embodiment, thus is avoided that the use " critical zone " in multithreading And the performance of algorithm is restricted, the cross-thread unnecessary stand-by period is eliminated, improves the runnability of algorithm.
In some optional implementations of the present embodiment, the method further includes:
The maximum in the maximum of the pixel absolute value of multiple basic data blocks is determined from the first array, is defined as First maximum, the first maximum are the maximum of the pixel absolute value of view data.
, will be every after multiple threads are finished by the view data of the multiple basic data blocks of multi-threading parallel process Maximum and the maximum index that a thread is found are stored in interim array, finally use serial algorithm true from the first array Make the first maximum.
In some optional implementations of the present embodiment,
First maximum is solved in interim array using serial algorithm, since the size of interim array only has 64 numbers According to using serial computing, calculating the maximum of the pixel absolute value of view data, the program of specific algorithm is as follows:
In the present embodiment, serial computing can't bring very big expense, conversely according to original mode, although according to Multi-threaded parallelization processing is so employed, will greatly reduce parallelization efficiency, is determined using serial algorithm from the first array It is more efficient on the contrary to go out the first maximum.
In some optional implementations of the present embodiment, before creating two interim memory headrooms in memory, wound Distributor template is built, wherein, distributor template includes two parameters:Data type, the byte-sized of alignment of distribution.In this reality Apply in example, to accelerate memory access efficiency, KNL is equipped with MCDRA on piece high-speed internal memories, and accessing the speed of MCDRAM can reach 400GB/s~500GB/s, almost accesses 5 to 6 times of common DDR4 memories., will be interim in order to effectively improve memory access efficiency Array is stored in MCDRAM high-speed internal memories.But the interim memory headrooms of MCDRAM are opened up on KNL to be needed to call specially API function library, it is therefore desirable to create distributor template, and the distributor template of establishment is transferred when calling.
First, distributor template " template is created<typename T,std::size_t Alignment>class Aligned_allocater ", wherein, the parameter in distributor template includes two parameters:The data type of distribution, alignment Byte-sized.First parameter T be the data type of distribution wherein, which can be floating point type, integer type etc. Deng.Second parameter for alignment byte count sizes for example:After tested, 64 alignment are selected on KNL, are because 64 alignment It is that effect is best on KNL processors, is conducive to carry out vectorization operation well, improves the efficiency of vectorization.
When calling distributor template to call MCDRAM allocation spaces, the function " hbw_ of MCDRAM allocation spaces is utilized Posix_memalign () " calls MCDRAM allocation spaces, and the function provided in calling MCDRAM, adds corresponding head text Part " hbwmalloc.h ";When using distributor template, by " vector<float>" it is changed to " vector<float, aligned_allocater<float,64>>", and addition-lmemkind the link options in link, realizing will be interim Array is assigned to MCDRAM high-speed internal memories space, and specific implementation program is as follows:
Step 102:The image-region for needing to update according to the index value of the maximum of pixel absolute value calculating view data;
In the present embodiment, algorithm flow chart is schemed as shown in figure 3, being calculated according to the index value of the maximum of pixel absolute value As the image-region that data needs update, including:Determined according to the corresponding index value of the first maximum and subtractPSF functions View data needs the image-region updated;Wherein, the corresponding index value of the first maximum is the pixel absolute value of view data Maximum index value.First maximum is the maximum of the pixel absolute value of view data.
View data is updated, algorithm flow chart is as shown in Figure 3, it is known that algorithm can be according to the maximum in view data Value index is calculated, and calculates the image-region for needing to update, the view data in the image-region updated afterwards to needs It is updated.
Step 103:Using multi-threaded parallel algorithm, renewal needs the view data of the image-region updated.
In the present embodiment, using multi-threaded parallel algorithm, renewal needs the view data bag of the image-region updated Include:The view data for the image-region for needing to update is divided into multiple sub-blocks first;
Then by multiple threads to multiple sub-block parallel processings, wherein, subtract is respectively adopted in per thread The view data of the corresponding each sub-block of PSF function renewal, the respective son of the specific concurrent independent access of per thread Data block, calculates the view data for updating respective sub-block, and final multiple thread process are completed entirely to need the figure updated As the renewal of the view data in region.
Step 104:View data after output renewal;
In the present embodiment, above-mentioned multiple thread process completions can entirely be needed to the picture number of image-region updated According to renewal after view data exported, the view data after being updated.
It should be noted that the above is only the specific embodiment of the present invention, it is same as the previously described embodiments or similar Embodiment, and above-described embodiment variation all within protection scope of the present invention.
In addition, this application provides a kind of one embodiment of the processing unit of the view data based on KNL platforms, the dress It is corresponding with the embodiment of the method shown in Fig. 1 to put embodiment, which specifically can be applied in various electronic equipments.
As shown in figure 4, the processing unit of the view data based on KNL platforms of the present embodiment includes:Acquisition module, first Processing module, Second processing module and output module;Wherein, acquisition module is used to obtain pending view data;At first The maximum that module is used to calculate the pixel absolute value of view data using multi-threaded parallel algorithm is managed, and according to the figure calculated As the maximum of the pixel absolute value of data determines the index value of the maximum of the pixel absolute value of view data;Wherein, image The index value of the maximum of the pixel absolute value of data represents the position of the maximum of the pixel absolute value of view data;At second Reason module is used for the image-region for calculating view data according to the index value of the maximum of pixel absolute value and needing to update;Export mould Block is used to export the view data after renewal.
Preferably, first processing module calculates the maximum of the pixel absolute value of view data using multi-threaded parallel algorithm Including:The pending view data that acquisition module obtains is divided into multiple basic data blocks;By multiple threads to more numbers According to block parallel processing, wherein, it is absolute that each master data of findPeak function lookups pixel in the block is respectively adopted in per thread The maximum of value.
Preferably, first processing module, is additionally operable to using each master data of findPeak function lookups picture in the block The maximum of plain absolute value, and pixel absolute value is determined most according to the maximum of the pixel absolute value of the view data calculated Before the index value being worth greatly, two interim memory headrooms are created in memory;
Preferably, the maximum of master data pixel absolute value in the block is being found, and according to the picture number calculated According to pixel absolute value maximum determine the index value of maximum of pixel absolute value after, according to per thread mark will The rope of the maximum for the corresponding master data pixel absolute value in the block that per thread is found and the maximum of pixel absolute value Draw in the corresponding interim memory headroom of value deposit, form interim array;
Wherein, interim array includes:First array and the second array;First array includes and multiple basic data blocks one The maximum of one corresponding multiple pixel absolute values, the second array include and multiple basic data blocks multiple pixels correspondingly The index value of the maximum of absolute value.
Preferably, first processing units, before being additionally operable to create two interim memory headrooms in memory, create distributor Template, wherein, distributor template includes two parameters:Data type, the byte-sized of alignment of distribution;It is interim when creating two During memory headroom, distributor template is called.
Preferably, first processing module, in the maximum for being additionally operable to determine multiple pixel absolute values from the first array Maximum, be defined as the first maximum, the first maximum for the pixel absolute value of view data maximum.
Preferably, first processing module, is additionally operable to determine the first maximum from the first array using serial algorithm.
Preferably, Second processing module calculates view data according to the index value of the maximum of pixel absolute value and needs to update Image-region, including:View data needs are determined according to the corresponding index value of the first maximum and subtract PSF functions The image-region of renewal;Wherein, the corresponding index value of the first maximum is the rope of the maximum of the pixel absolute value of view data Draw value.
Preferably, Second processing module uses multi-threaded parallel algorithm, and renewal needs the picture number of the image-region updated According to including:The view data for the image-region for needing to update is divided into multiple sub-blocks;By multiple threads to multiple subnumbers According to block parallel processing, wherein, the figure of the corresponding each sub-block of subtractPSF functions renewal is respectively adopted in per thread As data.
The treating method and apparatus of view data proposed by the present invention based on KNL platforms:By splitting to view data, Using multi-threading parallel process, solves the problems, such as serial process image;Some extra buffers are wherein also distributed, processing is big The problem of array, is converted into the problem of handling small array, solves the problems, such as that cross-thread waits during direct solution big array.The journey Sequence also uses the efficiency of memory MCDRAM raisings memory access on high-speed chip on KNL, furthermore with VPU numerous on KNL (Vector Processing Unit) unit, program added in interior loop compiling instruct sentence instruct compiler carry out to Quantization operation.Specific implementation method of the present invention includes:(1) deConvolution algorithms (2) fortune is implemented using KNL calculating platforms Handled with multi-threaded parallelization;(3) opening up for extra buffer waits to avoid cross-thread;(4) point of MCDRAM high-speed internal memories Match somebody with somebody;(5) core layers are recycled with AVX-512 vectorizations.Wherein:
(1) deConvolution algorithms are implemented using KNL calculating platforms:KNL (Knights Landing) is Intel Two generation Xeon Phi many-core processors, can at most carry 72 general cores of CPU, and each core can support 4 threads concurrently to hold OK, maximum can support 288 threads concurrently to perform.With first generation Xeon Phi many-core processors KNC (Knights Corner) Compare, the characteristics of it is maximum is that KNL supports master processor mode, and is no longer that coprocessor assists CPU to calculate.Primary processor mould Formula greatly improves portability and convenience of the program in KNL platform programs, while avoids data by PCI-E in master Deposit with the process copied back and forth on accelerator, eliminate the time-consuming of Data Migration copy, this be also KNL compared to other associations at Manage the characteristics of device is maximum.
(2) handled with OpenMP multi-threaded parallelizations:OpenMP (Open Multi-Processing) multi-threaded parallel Change processing to be mainly used at two, first is in the maximum for the parts of images split in multithreading block research per thread With the index of maximum;Second is in the partial image data that respective thread segmentation is updated in OpenMP multi-threading parallel process, So as to complete the renewal of whole image data.
(3) opening up for extra buffer waits to avoid cross-thread:Essentially consist in the searching to image maximum, first Apply for two interim arrays (array deposits maximum, and an array deposits the index of maximum), the size of array is The number of OpenMP threads;Apply for per thread the privately owned local maximum of a thread and local maximum index, this The partial image data that sample per thread splits respective thread is searched, the topography's maximum and Qi Suo that will be found Draw in the interim array of deposit;After all OpenMP thread process, serial searches maximum and maximum at zero in array Value index.Temporary buffer is so opened up, can avoid directly searching the line of maximum with multiple threads complete image data The time that journey waits.
(4) distribution of MCDRAM high-speed internal memories:MCDRAM (Multi-Channel DRAM) is the on piece on KNL processors High-speed internal memory, its bandwidth are more than 4 times of common DDR4 bandwidth, and capacity can reach more than 16GB, and MCDRAM can be matched somebody with somebody by BIOS Be set to third level caching, an independent NUMA node and the two mix these three patterns, use independent NUMA in the present invention Node mode distributes critical data in MCDRAM caches, improves the efficiency of memory access.
(5) the automatic vectorizations of AVX-512 are utilized:Compiler is once loaded using SIMD technologies, handles multi-group data, and The upper distinctive 512 bit vector processing units of KNL and AVX-512 instruction set can make compiler disposably handle 16 single precisions to float Points, very big faster procedure calculated performance.
It should be noted that herein, term " comprising ", "comprising" or its any other variant are intended to non-row His property includes, so that process, method, article or device including a series of elements not only include those key elements, and And other elements that are not explicitly listed are further included, or further include as this process, method, article or device institute inherently Key element.In the absence of more restrictions, the key element limited by sentence "including a ...", it is not excluded that including this Also there are other identical element in the process of key element, method, article or device.
Embodiments of the present invention sequencing is for illustration only, does not represent the quality of embodiment.It is any with it is of the invention The same or similar scheme of mentality of designing, and the change of scheme the same or similar with the embodiment of the present invention and the embodiment of the present invention Body is all within protection scope of the present invention.
Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment side Method can add the mode of required general hardware platform to realize by software, naturally it is also possible to by hardware, but in many cases The former is more preferably embodiment.Based on such understanding, technical scheme substantially in other words does the prior art Going out the part of contribution can be embodied in the form of software product, which is stored in a storage medium In (such as ROM/RAM, magnetic disc, CD), including some instructions are used so that a station terminal equipment (can be mobile phone, computer, takes Be engaged in device, air conditioner, or network equipment etc.) perform method described in each embodiment of the present invention.
It these are only the preferred embodiment of the present invention, be not intended to limit the scope of the invention, it is every to utilize this hair The equivalent structure or equivalent flow shift that bright specification and accompanying drawing content are made, is directly or indirectly used in other relevant skills Art field, is included within the scope of the present invention.

Claims (10)

  1. A kind of 1. processing method of the view data based on KNL platforms, it is characterised in that the described method includes:
    Obtain pending view data;
    The maximum of the pixel absolute value of described image data is calculated using multi-threaded parallel algorithm, and according to calculating The maximum of the pixel absolute value of view data determines the index value of the maximum of the pixel absolute value of described image data;
    The image-region for needing to update according to the index value of the maximum of pixel absolute value calculating described image data;
    Using the multi-threaded parallel algorithm, the view data of the renewal image-region for needing to update;
    View data after output renewal;
    Wherein, the index value of the maximum of the pixel absolute value represents the maximum of the pixel absolute value of described image data Position.
  2. 2. the processing method of the view data according to claim 1 based on KNL platforms, it is characterised in that the use The maximum that multi-threaded parallel algorithm calculates the pixel absolute value of described image data includes:
    Described image data are divided into multiple basic data blocks;
    By multiple threads to the multiple data block parallel processing, wherein, per thread is respectively adopted findPeak functions and looks into Look for the maximum of each master data pixel absolute value in the block.
  3. 3. the processing method of the view data according to claim 2 based on KNL platforms, it is characterised in that the method Further include:
    In the maximum using master data pixel absolute value in the block described in findPeak function lookups, and according to calculating Before the maximum of the pixel absolute value of described image data determines the index value of the maximum of the pixel absolute value, in memory Two interim memory headrooms of middle establishment;
    The maximum of master data pixel absolute value in the block is being found, and according to the described image data calculated , will according to the mark of per thread after the maximum of pixel absolute value determines the index value of the maximum of the pixel absolute value The maximum for the corresponding master data pixel absolute value in the block that the per thread is found and the pixel absolute value are most In the corresponding interim memory headroom of index value deposit being worth greatly, interim array is formed;
    Wherein, the interim array includes:First array and the second array;First array includes and multiple master datas The maximum of block multiple pixel absolute values correspondingly, second array includes and the multiple basic data block one is a pair of The index value of the maximum for the multiple pixel absolute values answered.
  4. 4. the processing method of the view data according to claim 3 based on KNL platforms, it is characterised in that the method Further include:
    Before creating two interim memory headrooms in the memory, distributor template is created, wherein, the distributor template bag Include two parameters:Data type, the byte-sized of alignment of distribution;
    When creating described two interim memory headrooms, the distributor template is called.
  5. 5. the processing method of the view data according to claim 2 based on KNL platforms, it is characterised in that the method Further include:
    The maximum in the maximum of multiple pixel absolute values is determined from first array, it is maximum to be defined as first Value, first maximum are the maximum of the pixel absolute value of described image data.
  6. 6. the processing method of the view data according to claim 5 based on KNL platforms, it is characterised in that the method Further include:
    First maximum is determined from first array using serial algorithm.
  7. 7. the processing method of the view data according to claim 1 based on KNL platforms, it is characterised in that the basis The index value of the maximum of the pixel absolute value calculates the image-region that described image data need to update, including:
    Determine that described image data need what is updated according to the corresponding index value of first maximum and subtractPSF functions Image-region;
    Wherein, the corresponding index value of first maximum is the index of the maximum of the pixel absolute value of described image data Value.
  8. 8. the processing method of the view data according to claim 7 based on KNL platforms, it is characterised in that the use Multi-threaded parallel algorithm, updating the view data of the image-region for needing to update includes:
    The view data of the image-region for needing to update is divided into multiple sub-blocks;
    By multiple threads to the multiple sub-block parallel processing, wherein, per thread is respectively adopted described The view data of the corresponding each sub-block of subtractPSF functions renewal.
  9. 9. a kind of processing unit of the view data based on KNL platforms, it is characterised in that described device includes:
    Acquisition module, for obtaining pending view data;
    First processing module, the maximum of the pixel absolute value for calculating described image data using multi-threaded parallel algorithm, And the pixel absolute value of described image data is determined according to the maximum of the pixel absolute value of the described image data calculated The index value of maximum;
    Second processing module, the index value for the maximum according to the pixel absolute value, which calculates described image data, to be needed more New image-region;
    Output module, for exporting the view data after updating;
    Wherein, the index value of the maximum of the pixel absolute value of described image data represents the pixel absolute value of described image data Maximum position.
  10. 10. the processing unit of the view data according to claim 9 based on KNL platforms, it is characterised in that described first The maximum that processing module calculates the pixel absolute value of described image data using multi-threaded parallel algorithm includes:
    Described image data are divided into multiple basic data blocks;
    By multiple threads to the multiple data block parallel processing, wherein, per thread is respectively adopted findPeak functions and looks into Look for the maximum of each master data pixel absolute value in the block.
CN201711407553.9A 2017-12-22 2017-12-22 A kind of processing method and processing device of the view data based on KNL platforms Pending CN108008975A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711407553.9A CN108008975A (en) 2017-12-22 2017-12-22 A kind of processing method and processing device of the view data based on KNL platforms

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711407553.9A CN108008975A (en) 2017-12-22 2017-12-22 A kind of processing method and processing device of the view data based on KNL platforms

Publications (1)

Publication Number Publication Date
CN108008975A true CN108008975A (en) 2018-05-08

Family

ID=62060735

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711407553.9A Pending CN108008975A (en) 2017-12-22 2017-12-22 A kind of processing method and processing device of the view data based on KNL platforms

Country Status (1)

Country Link
CN (1) CN108008975A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108874547A (en) * 2018-06-27 2018-11-23 郑州云海信息技术有限公司 A kind of data processing method and device of astronomy software Gridding
CN109062636A (en) * 2018-07-20 2018-12-21 浪潮(北京)电子信息产业有限公司 A kind of data processing method, device, equipment and medium
CN110879744A (en) * 2018-09-06 2020-03-13 第四范式(北京)技术有限公司 Method and system for executing computation graph by multiple threads
CN111429413A (en) * 2020-03-18 2020-07-17 中国建设银行股份有限公司 Image segmentation method and device and computer readable storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102044071A (en) * 2010-12-28 2011-05-04 上海大学 Single-pixel margin detection method based on FPGA
US8213518B1 (en) * 2006-10-31 2012-07-03 Sony Computer Entertainment Inc. Multi-threaded streaming data decoding
US8531725B2 (en) * 2010-06-08 2013-09-10 Canon Kabushiki Kaisha Rastering disjoint regions of the page in parallel
CN106910157A (en) * 2017-02-17 2017-06-30 公安部第研究所 The image rebuilding method and device of a kind of multistage parallel

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8213518B1 (en) * 2006-10-31 2012-07-03 Sony Computer Entertainment Inc. Multi-threaded streaming data decoding
US8531725B2 (en) * 2010-06-08 2013-09-10 Canon Kabushiki Kaisha Rastering disjoint regions of the page in parallel
CN102044071A (en) * 2010-12-28 2011-05-04 上海大学 Single-pixel margin detection method based on FPGA
CN106910157A (en) * 2017-02-17 2017-06-30 公安部第研究所 The image rebuilding method and device of a kind of multistage parallel

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
侯天娇等: "一种基于FPGA的彩色图像实时增强方法", 《液晶与显示》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108874547A (en) * 2018-06-27 2018-11-23 郑州云海信息技术有限公司 A kind of data processing method and device of astronomy software Gridding
CN109062636A (en) * 2018-07-20 2018-12-21 浪潮(北京)电子信息产业有限公司 A kind of data processing method, device, equipment and medium
CN110879744A (en) * 2018-09-06 2020-03-13 第四范式(北京)技术有限公司 Method and system for executing computation graph by multiple threads
CN110879744B (en) * 2018-09-06 2022-08-16 第四范式(北京)技术有限公司 Method and system for executing computation graph by multiple threads
CN111429413A (en) * 2020-03-18 2020-07-17 中国建设银行股份有限公司 Image segmentation method and device and computer readable storage medium

Similar Documents

Publication Publication Date Title
CN108008975A (en) A kind of processing method and processing device of the view data based on KNL platforms
Ragan-Kelley et al. Decoupling algorithms from schedules for easy optimization of image processing pipelines
CN104050632B (en) Method and system for the processing of multisample pixel data
CN106095588B (en) CDVS extraction process accelerated method based on GPGPU platform
CN110135569B (en) Heterogeneous platform neuron positioning three-level flow parallel method, system and medium
Bottleson et al. clcaffe: Opencl accelerated caffe for convolutional neural networks
CN104992421B (en) A kind of parallel optimization method of the Image denoising algorithm based on OpenCL
Tanaka et al. Automatic graph partitioning for very large-scale deep learning
de Oliveira et al. Partitioning convolutional neural networks for inference on constrained Internet-of-Things devices
Jeon et al. Parallel exact inference on a CPU-GPGPU heterogenous system
CN109408867B (en) Explicit R-K time propulsion acceleration method based on MIC coprocessor
Levchenko et al. GPU implementation of ConeTorre algorithm for fluid dynamics simulation
CN106484532B (en) GPGPU parallel calculating method towards SPH fluid simulation
CN113655986B9 (en) FFT convolution algorithm parallel implementation method and system based on NUMA affinity
Jansson Spectral Element simulations on the NEC SX-Aurora TSUBASA
Quesada-Barriuso et al. Efficient GPU asynchronous implementation of a watershed algorithm based on cellular automata
US11960982B1 (en) System and method of determining and executing deep tensor columns in neural networks
Willis et al. An efficient SIMD implementation of pseudo-Verlet lists for neighbour interactions in particle-based codes
Zhu et al. A parallel non-local means denoising algorithm implementation with openmp and opencl on intel xeon phi coprocessor
Liu et al. Parallelizing convolutional neural networks on intel many integrated core architecture
CN106598552A (en) Data point conversion method and device based on Gridding module
Kim et al. Optimizing seam carving on multi-GPU systems for real-time content-aware image resizing
Bederián et al. Boosting quantum evolutions using Trotter-Suzuki algorithms on GPUs
Selgrad et al. A High-Performance Image Processing DSL for Heterogeneous Architectures.
Gutierrez et al. A fast level-set segmentation algorithm for image processing designed for parallel architectures

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20190222

Address after: 100085 Beijing Haidian District Shangdi Information Road 2-1 C Building 1 Floor

Applicant after: INSPUR (BEIJING) ELECTRONIC INFORMATION INDUSTRY Co.,Ltd.

Address before: Room 1601, floor 16, 278 Xinyi Road, Zhengdong New District, Zhengzhou City, Henan Province

Applicant before: ZHENGZHOU YUNHAI INFORMATION TECHNOLOGY Co.,Ltd.

RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20180508