CN109062636A - A kind of data processing method, device, equipment and medium - Google Patents
A kind of data processing method, device, equipment and medium Download PDFInfo
- Publication number
- CN109062636A CN109062636A CN201810803156.1A CN201810803156A CN109062636A CN 109062636 A CN109062636 A CN 109062636A CN 201810803156 A CN201810803156 A CN 201810803156A CN 109062636 A CN109062636 A CN 109062636A
- Authority
- CN
- China
- Prior art keywords
- data
- initial data
- function
- mpi
- progress
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/448—Execution paradigms, e.g. implementations of programming paradigms
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2209/00—Indexing scheme relating to G06F9/00
- G06F2209/50—Indexing scheme relating to G06F9/50
- G06F2209/5018—Thread allocation
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a kind of data processing method, device, equipment and medium, the step of this method includes: that MPI is utilized to start N number for the treatment of progress;Wherein, N is the integer greater than 1, is preset with data operation function in treatment progress;Initial data is obtained, and respectively distributes each initial data to corresponding target treatment progress;In each target treatment progress, it is all made of OpenMP and initial data is divided into data slot and is distributed to each default thread, to carry out operation to initial data by data operation function.This method opens up multiple treatment progress by the way of MPI and OpenMP cooperating, and the initial data that the process get is distributed into thread to parallel processing by OpenMP in each process, so as to improve the overall performance for carrying out calculation process to mass data.In addition, the present invention also provides a kind of data processing equipment, equipment and medium, beneficial effect is same as above.
Description
Technical field
The present invention relates to big data fields, more particularly to a kind of data processing method, device, equipment and medium.
Background technique
With the continuous development of science and technology, and for the exploration in astronomical field constantly deepen, " square kilometer array " is astronomical
Telescope (SKA, Square Kilometer Array) has become the project currently having attracted much attention.In the project, for
The performance of magnanimity chronometer data processing is one of the key factor that can the project effectively push.
For current astronomical image data processing work, the underlying basis that can be realized is to large nuber of images
The operation of data is generally required to generate biggish time overhead, and then is acquired by operation when carrying out operation to mass data
Corresponding operation result, and the data volume for carrying out calculation process is bigger, and obtained operation result is more accurate, and then astronomic graph
As the overall effect of data processing is better, the clarity and availability of the astronomical image ultimately generated are higher, therefore in astronomy
When image real time transfer, the performance that the operation of mass data can be provided is supported, is to determine that can astronomical image real time transfer
Further key factor.
DeConvolution program is the important ring in SKA project in science data processing module, is that this module is fallen into a trap
One of maximum application of calculation amount.DeConvolution program in existing ASKAP software package, is passed through using single thread
OpenMP handles the distribution of computation tasks that operation process receives to multiple threads, finally to original in a manner of thread collaboration
Beginning data carry out calculation process, but by individual process realization for initial data calculation process when, each process exists
It is merely able to carry out calculation process to a certain number of initial data in unit time, the data volume of processing is relatively limited, therefore nothing
The hardware supported that method adequately can be provided using arithmetic facility, and then in the current situation, data are carried out by arithmetic facility
The overall performance of processing is relatively low.
It can be seen that a kind of data processing method is provided, to improve entirety when arithmetic facility carries out mass data processing
Performance is those skilled in the art's urgent problem to be solved.
Summary of the invention
The object of the present invention is to provide a kind of data processing method, device, equipment and media, to improve arithmetic facility progress
Overall performance when mass data processing.
In order to solve the above technical problems, the present invention provides a kind of data processing method, comprising:
Start N number for the treatment of progress using MPI;Wherein, N is the integer greater than 1, is preset with data operation letter in treatment progress
Number;
Initial data is obtained, and respectively distributes each initial data to corresponding target treatment progress;
In each target treatment progress, it is all made of OpenMP and initial data is divided into data slot and is distributed to each default
Thread, to carry out operation to initial data by data operation function.
Preferably, start N number for the treatment of progress using MPI specifically:
Start N number of deConvolution treatment progress using MPI;
Correspondingly, initial data is specially initial data array;
Correspondingly, data operation function specifically includes findPeaks function and subtractPSF function.
Preferably, start N number of deConvolution treatment progress using MPI specifically:
Started in the arithmetic facility equipped with Intel KNM processor using MPI N number of deConvolution handle into
Journey.
Preferably, #pragma simd instruction is preset in findPeaks function and subtractPSF function;
Correspondingly, this method further comprises before carrying out operation to initial data array by data operation function:
Data operation function is compiled, and loads AVX512 instruction set in compiling content.
Preferably, initial data is obtained specifically:
Initial data array is obtained in MCDRAM memory;Wherein, the initial data array in MCDRAM memory passes through in advance
Cross 8 byte-aligneds.
Preferably, initial data array is divided by data slot using OpenMP and distributed to each default thread, with logical
Crossing the specific steps that findPeaks function carries out operation to initial data array includes:
The first temporary space and the second temporary space are opened up in advance;
It controls each default thread and operation is carried out to corresponding data slot by findPeaks function, and most by generation
Big value record is to the first temporary space, by the corresponding maximum value index record of maximum value to the second temporary space;
When completing the operation to each data slot, reduction operations generation is carried out to each maximum value in the first temporary space
Global maximum, and corresponding global maximum index is obtained in the second temporary space according to global maximum.
Preferably, after carrying out operation to initial data by data operation function, this method further comprises:
The result for carrying out operation to initial data is recorded to preset result log.
In addition, the present invention also provides a kind of data processing equipments, comprising:
Process initiation module, for starting N number for the treatment of progress using MPI;Wherein, N is the integer greater than 1, treatment progress
In be preset with data operation function;
Data allocation module distributes each initial data to the processing of corresponding target for obtaining initial data, and respectively
Process;
Functional operation module, for being all made of OpenMP for initial data and being divided into data in each target treatment progress
Segment is simultaneously distributed to each default thread, to carry out operation to initial data by data operation function.
In addition, the present invention also provides a kind of data processing equipments, comprising:
Memory, for storing computer program;
Processor is realized when for executing computer program such as the step of above-mentioned data processing method.
In addition, being stored with meter on computer readable storage medium the present invention also provides a kind of computer readable storage medium
Calculation machine program is realized when computer program is executed by processor such as the step of above-mentioned data processing method.
Data processing method provided by the present invention is to start 1 or more treatment progress by MPI, and each place
It is preset with data operation function and multiple threads in reason process, and then obtains initial data, and respectively by each initial data point
It is assigned to corresponding target treatment progress, and is all made of OpenMP in each target treatment progress and initial data is divided into data
Segment is simultaneously distributed to each preset thread, and then is realized by each thread and carried out according to data operation function to initial data jointly
The effect of operation.This method is by the way of MPI and OpenMP cooperating, by opening up multiple treatment progress, each processing
Process obtains corresponding initial data, and the original number for getting the process by OpenMP in each process simultaneously
According to distribution parallel processing into thread, arithmetic facility is improved in contrast, initial data is carried out at operation within the unit time
The overall quantity of reason, and then improve the overall performance that calculation process is carried out to mass data.In addition, the present invention also provides one kind
Data processing equipment, equipment and medium, beneficial effect are same as above.
Detailed description of the invention
In order to illustrate the embodiments of the present invention more clearly, attached drawing needed in the embodiment will be done simply below
It introduces, it should be apparent that, drawings in the following description are only some embodiments of the invention, for ordinary skill people
For member, without creative efforts, it is also possible to obtain other drawings based on these drawings.
Fig. 1 is a kind of flow chart of data processing method provided in an embodiment of the present invention;
Fig. 2 is a kind of structure chart of data processing equipment provided in an embodiment of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, rather than whole embodiments.Based on this
Embodiment in invention, those of ordinary skill in the art are without making creative work, obtained every other
Embodiment belongs to the scope of the present invention.
Core of the invention is to provide a kind of data processing method, with improve arithmetic facility carry out mass data processing when
Overall performance.Another core of the invention is to provide a kind of data processing equipment, equipment and medium.
In order to enable those skilled in the art to better understand the solution of the present invention, with reference to the accompanying drawings and detailed description
The present invention is described in further detail.
Embodiment one
Fig. 1 is a kind of flow chart of data processing method provided in an embodiment of the present invention.Referring to FIG. 1, data processing side
The specific steps of method include:
Step S10: start N number for the treatment of progress using MPI.
Wherein, N is the integer greater than 1, is preset with data operation function in treatment progress.
It should be noted that MPI (Message Passing Interface, message passing interface), is one across language
Communications protocol support point-to-point and broadcast for writing parallel computer.MPI is an information transmitting application programming interfaces,
Including agreement and semantic description, they indicate how it plays its characteristic in various implementations, and the target of MPI is high-performance, greatly
Scale, and it is portable, MPI is in the main models that today is still high-performance calculation.Since MPI can make in a specific group
All processes all participate in global data processing and traffic operation, and all processes are agreed on specific point, therefore
For this step after opening N number for the treatment of progress by way of calling the interface of MPI, which can carry out data jointly
Handle work.In addition, it is necessary to which the quantity of i.e. treatment progress should be greater than 1, it is emphasized that the N in this step is the integer greater than 1
It is a.
Step S11: initial data is obtained, and respectively distributes each initial data to corresponding target treatment progress.
It should be noted that initial data acquired in this step is data to be processed, it is in this step, to be processed
Data be assigned into corresponding target treatment progress.It should be noted that being between initial data and target treatment progress
There are corresponding relationship, i.e., a certain initial data only can carry out relevant data fortune by corresponding target treatment progress
It calculates.The concrete type and content of initial data should be not specifically limited herein depending on actual usage scenario.
Step S12: it in each target treatment progress, is all made of OpenMP and initial data is divided into data slot and is distributed
To each default thread, to carry out operation to initial data by data operation function.
It should be noted that OpenMP provides the high-rise abstractdesription to parallel algorithm, programmer passed through in source generation
Dedicated #pragma is added to indicate the intention of oneself in code, and then since program can be carried out parallelization automatically by compiler,
And synchronization and mutex and communication are added in place of necessity, therefore in this step, initial data is divided by number by OpenMP
It according to segment and distributes in each default thread into target treatment progress, and then by each default thread according to data operation function
Concurrent operation is carried out to the data slot of initial data, the parallel processing to initial data is realized with this, ensure that at each target
The efficiency that reason process handles initial data.
Data processing method provided by the present invention is to start 1 or more treatment progress by MPI, and each place
It is preset with data operation function and multiple threads in reason process, and then obtains initial data, and respectively by each initial data point
It is assigned to corresponding target treatment progress, and is all made of OpenMP in each target treatment progress and initial data is divided into data
Segment is simultaneously distributed to each preset thread, and then is realized by each thread and carried out according to data operation function to initial data jointly
The effect of operation.This method is by the way of MPI and OpenMP cooperating, by opening up multiple treatment progress, each processing
Process obtains corresponding initial data, and the original number for getting the process by OpenMP in each process simultaneously
According to distribution parallel processing into thread, arithmetic facility is improved in contrast, initial data is carried out at operation within the unit time
The overall quantity of reason, and then improve the overall performance that calculation process is carried out to mass data.
Embodiment two
On the basis of the above embodiments, the present invention also provides a series of preferred embodiments.
As a preferred embodiment, starting N number for the treatment of progress using MPI specifically:
Start N number of deConvolution treatment progress using MPI;
Correspondingly, initial data is specially initial data array;
Correspondingly, data operation function specifically includes findPeaks function and subtractPSF function.
Present embodiment is suitable at the data under SKA (" square kilometer array " astronomical telescope) and similar scene
Reason.It should be noted that due to science data processing (SDP, Science Data Processing) be in SKA project most
Key link, and the algorithm being related in SDP has deGridding, deConvolution, FFT and Projection etc.,
And the calculation amount of deConvolution (image deconvolution) algorithm accounts for 20% or so of SDP the amount of calculation, is relatively heavy in SDP
The operation content wanted, therefore in order to improve the efficiency based on deConvolution algorithm process data, it is utilized in present embodiment
The treatment progress of MPI starting, it is therefore an objective to by handling multiple series of images data simultaneously, make full use of processor calculated performance and set
Standby bandwidth.
FindPeaks function is used to find the maximum value and maximum value index position in astronomical image array,
FindPeaks function is specifically to obtain maximum value and its corresponding index position in array according to the method for traversal comparison;And
SubtractPSF function is to find out the part for needing to update in image according to the maximum value of image array and the index of maximum value,
And then image data is updated, it is specially calculated according to the maximum value index in image array, calculates needs more
New image-region, is later updated image data.Since findPeaks function and subtractPSF function are these
Operation function well known to the technical staff of field, thus herein not to findPeaks function and subtractPSF function it is specific in
Appearance is repeated.In the present embodiment, using parallel deConvolution treatment progress and each deConvolution
Parallel default thread handles the corresponding initial data array of astronomical image in treatment progress, opposite to improve execution
The whole efficiency of deConvolution algorithm.
On the basis of the above embodiment, as a preferred embodiment, being started using MPI N number of
DeConvolution treatment progress specifically:
Started in the arithmetic facility equipped with Intel KNM processor using MPI N number of deConvolution handle into
Journey.
It should be noted that Intel KNM (Knights Mill) processor is a many-core processing of Intel publication
Device, KNM processor possesses a large amount of logic core, but since the dominant frequency of single kernel is lower, monokaryon performance is weaker, when
When processing is than relatively time-consuming serial code, performance tends not to highly desirable program, and present embodiment is handled by KNM
Device executes parallel deConvolution treatment progress, therefore each monokaryon can access reasonable benefit in KNM processor
With the advantage that KNM processor has numerous monokaryons is utilized to the greatest extent, further improves whole data processing effect
Rate.
In addition, on the basis of the above embodiment, as a preferred embodiment, findPeaks function and
#pragma simd instruction is preset in subtractPSF function;
Correspondingly, this method further comprises before carrying out operation to initial data array by data operation function:
Data operation function is compiled, and loads AVX512 instruction set in compiling content.
Due to all referring to the loop iteration of operation content in findPeak function and subtractPSF function, in order to
Compiler can be instructed to be able to carry out vectorization operation when encountering circulation, to be obviously improved the execution efficiency of above-mentioned function,
" #pragma simd " pre-processing instruction can be added in the front of for Do statement, while compiling option being added in compiling
"-xCOMMON-AVX512 " loads AVX512 instruction set in compiling content, can give full play to KNM using this scheme
On AVX512 instruction set, the performance boost of program is made by height vector.
In addition, on the basis of the above embodiment, as a preferred embodiment, it is specific to obtain initial data
Are as follows:
Initial data array is obtained in MCDRAM memory;Wherein, the initial data array in MCDRAM memory passes through in advance
Cross 8 byte-aligneds.
It should be noted that MCDRAM memory (Multi-Channel Dynamic Random Access Memory, it is more
Channel dynamic random access memory) it is equivalent to and is provided with multiple Memory Controller Hub in memory chip, between each Memory Controller Hub
It can work independently from each other, each Memory Controller Hub controls a corresponding main memory access, and then the bandwidth of MCDRAM memory
With the bandwidth and data reading speed for being multiple times than single channel memory, thus it is higher for the treatment effeciency of data.In addition,
Initial data array in MCDRAM memory first passes through 8 byte-aligneds in advance, moves between memory with facilitating data efficient.It is right
For KNM processor, when data initial address is located at 8 byte boundary, internal storage data is mobile to can reach optimum state, raising pair
In the whole efficiency of data acquisition.In order to help compiler to carry out vector quantization, need to carry out memory by the way of 8 byte-aligneds
Distribution, and inform that compiler internal storage access has been aligned using pragma/instruction.In code realization, for alignment Heap Allocation
Data, can be used " _ mm_malloc " order and " _ mm_free " order carrys out storage allocation array, additionally need insertion " #
Pragma vector aligned " clause, to inform that all arrays accessed in compiler particular cycle have been aligned.
Furthermore it is also possible to prefetch the benefit that means such as (Memory Prefetch) improve memory access efficiency and cache using memory
With rate.In the core innermost loop of deConvolution algorithm, it can be prefetched by calling " _ mm_prefetch " function
Array of data, is stored in cache in advance, avoids and calculated by the array of data used required for several subsequent calculating of step
The occurrence of data cache misses are read in journey, to improve the memory access efficiency to data.
In addition, on the basis of the above embodiment, as a preferred embodiment, using OpenMP by original number
Data slot is divided into according to array and is distributed to each default thread, to be carried out by findPeaks function to initial data array
The specific steps of operation include:
The first temporary space and the second temporary space are opened up in advance;
It controls each default thread and operation is carried out to corresponding data slot by findPeaks function, and most by generation
Big value record is to the first temporary space, by the corresponding maximum value index record of maximum value to the second temporary space;
When completing the operation to each data slot, reduction operations generation is carried out to each maximum value in the first temporary space
Global maximum, and corresponding global maximum index is obtained in the second temporary space according to global maximum.
It should be noted that the mistake of findPeaks functional operation is carried out to data slot by way of multi-threaded parallel
Cheng Zhong, each thread can generate the maximum value of a current data segment, and then need between each maximum value further
Reduction operations to obtain the global maximum of initial data array.It is used in existing deConvolution algorithm routine
Critical clause realizes to the reduction operations between the maximum value of thread each in process, but is carried out by Critical clause
When reduction operations, need the maximum value of operation that can cause the maximum value can not be in the mistake of this reduction operations by preparatory " locking "
It is called again in journey, therefore current mode increases the time of thread waiting, executing efficiency is lower, is findPeak
The bottleneck of function calculated performance.For this problem, corresponding temporary space can be opened up to each process, such as open up " temp_
Peak " and " temp_Pos " array.It controls each default thread and corresponding data slot is transported by findPeaks function
It calculates, and the maximum value of generation is recorded into temp_Peak array, by the corresponding maximum value index record of maximum value to the
In temp_Pos array, and then when the operation of the complete paired data segment of each thread, to each maximum value of temp_Peak array into
Row reduction operations generate global maximum, and corresponding global maximum is obtained in temp_Pos array according to global maximum
Index.By present embodiment, avoids and occur the case where maximum value is by " locking " in calculating process, improve whole fortune
Calculate efficiency.
In addition, as a preferred embodiment, being somebody's turn to do after carrying out operation to initial data by data operation function
Method further comprises:
The result for carrying out operation to initial data is recorded to preset result log.
It is understood that the result for carrying out operation to initial data is recorded to preset result log, Yong Huke
With relative efficiency according to demand got in result log operation as a result, convenient for operation result carry out subsequent analysis.
Embodiment three
Hereinbefore the embodiment of data processing method is described in detail, the present invention also provides one kind and is somebody's turn to do
The corresponding data processing equipment of method, since the embodiment of device part is corresponded to each other with the embodiment of method part, dress
Set part embodiment refer to method part embodiment description, wouldn't repeat here.
Fig. 2 is a kind of structure chart of data processing equipment provided in an embodiment of the present invention.Number provided in an embodiment of the present invention
According to processing unit, comprising:
Process initiation module 10, for starting N number for the treatment of progress using MPI;Wherein, N is integer greater than 1, handle into
Data operation function is preset in journey.
Data allocation module 11 distributes each initial data to corresponding target for obtaining initial data, and respectively
Reason process.
Functional operation module 12, for being all made of OpenMP for initial data and being divided into number in each target treatment progress
It according to segment and distributes to each default thread, to carry out operation to initial data by data operation function.
Data processing equipment provided by the present invention is to start 1 or more treatment progress by MPI, and each place
It is preset with data operation function and multiple threads in reason process, and then obtains initial data, and respectively by each initial data point
It is assigned to corresponding target treatment progress, and is all made of OpenMP in each target treatment progress and initial data is divided into data
Segment is simultaneously distributed to each preset thread, and then is realized by each thread and carried out according to data operation function to initial data jointly
The effect of operation.The present apparatus is by the way of MPI and OpenMP cooperating, by opening up multiple treatment progress, each processing
Process obtains corresponding initial data, and the original number for getting the process by OpenMP in each process simultaneously
According to distribution parallel processing into thread, arithmetic facility is improved in contrast, initial data is carried out at operation within the unit time
The overall quantity of reason, and then improve the overall performance that calculation process is carried out to mass data.
Example IV
The present invention also provides a kind of data processing equipments, comprising:
Memory, for storing computer program;
Processor is realized when for executing computer program such as the step of above-mentioned data processing method.
Data processing equipment provided by the present invention is to start 1 or more treatment progress by MPI, and each place
It is preset with data operation function and multiple threads in reason process, and then obtains initial data, and respectively by each initial data point
It is assigned to corresponding target treatment progress, and is all made of OpenMP in each target treatment progress and initial data is divided into data
Segment is simultaneously distributed to each preset thread, and then is realized by each thread and carried out according to data operation function to initial data jointly
The effect of operation.This equipment is by the way of MPI and OpenMP cooperating, by opening up multiple treatment progress, each processing
Process obtains corresponding initial data, and the original number for getting the process by OpenMP in each process simultaneously
According to distribution parallel processing into thread, arithmetic facility is improved in contrast, initial data is carried out at operation within the unit time
The overall quantity of reason, and then improve the overall performance that calculation process is carried out to mass data.
The present invention also provides a kind of computer readable storage medium, computer journey is stored on computer readable storage medium
Sequence is realized when computer program is executed by processor such as the step of above-mentioned data processing method.
Computer readable storage medium provided by the present invention is the treatment progress by MPI starting 1 or more, and
It is preset with data operation function and multiple threads in each treatment progress, and then obtains initial data, and respectively will be each original
Data are distributed to corresponding target treatment progress, and are all made of OpenMP in each target treatment progress and are divided initial data
It for data slot and distributes to each preset thread, and then is realized jointly according to data operation function by each thread to original number
According to the effect for carrying out operation.This computer readable storage medium is more by opening up by the way of MPI and OpenMP cooperating
A treatment progress, each treatment progress obtain corresponding initial data simultaneously, and will by OpenMP in each process
The initial data that the process is got distributes into thread parallel processing, improves arithmetic facility in contrast within the unit time
The overall quantity of calculation process is carried out to initial data, and then improves the overall performance that calculation process is carried out to mass data.
A kind of data processing method provided by the present invention, device, equipment and medium are described in detail above.It says
Each embodiment is described in a progressive manner in bright book, and the highlights of each of the examples are the differences with other embodiments
Place, the same or similar parts in each embodiment may refer to each other.For the device disclosed in the embodiment, due to its with
Method disclosed in embodiment is corresponding, so being described relatively simple, reference may be made to the description of the method.It should refer to
It out, for those skilled in the art, without departing from the principle of the present invention, can also be to the present invention
Some improvement and modification can also be carried out, and these improvements and modifications also fall within the scope of protection of the claims of the present invention.
It should also be noted that, in the present specification, relational terms such as first and second and the like be used merely to by
One entity or operation are distinguished with another entity or operation, without necessarily requiring or implying these entities or operation
Between there are any actual relationship or orders.Moreover, the terms "include", "comprise" or its any other variant meaning
Covering non-exclusive inclusion, so that the process, method, article or equipment for including a series of elements not only includes that
A little elements, but also including other elements that are not explicitly listed, or further include for this process, method, article or
The intrinsic element of equipment.In the absence of more restrictions, the element limited by sentence "including a ...", is not arranged
Except there is also other identical elements in the process, method, article or apparatus that includes the element.
Claims (10)
1. a kind of data processing method characterized by comprising
Start N number for the treatment of progress using MPI;Wherein, N is the integer greater than 1, is preset with data operation letter in the treatment progress
Number;
Initial data is obtained, and respectively distributes each initial data to corresponding target treatment progress;
In each target treatment progress, it is all made of OpenMP and the initial data is divided into data slot and is distributed to each
Default thread, to carry out operation to the initial data by the data operation function.
2. the method according to claim 1, wherein described start N number for the treatment of progress using MPI specifically:
Start N number of deConvolution treatment progress using the MPI;
Correspondingly, the initial data is specially initial data array;
Correspondingly, the data operation function specifically includes findPeaks function and subtractPSF function.
3. according to the method described in claim 2, it is characterized in that, described start N number of deConvolution using the MPI
Treatment progress specifically:
Started at N number of deConvolution in the arithmetic facility equipped with Intel KNM processor using the MPI
Reason process.
4. according to the method described in claim 3, it is characterized in that, the findPeaks function and the subtractPSF
#pragma simd instruction is preset in function;
Correspondingly, it is described operation is carried out to the initial data array by the data operation function before, this method is into one
Step includes:
The data operation function is compiled, and loads AVX512 instruction set in compiling content.
5. according to the method described in claim 4, it is characterized in that, the acquisition initial data specifically:
The initial data array is obtained in MCDRAM memory;Wherein, the initial data number in the MCDRAM memory
Group first passes through 8 byte-aligneds in advance.
6. according to the method described in claim 2, it is characterized in that, the initial data array is divided into number using OpenMP
It according to segment and distributes to each default thread, to carry out operation to the initial data array by the findPeaks function
Specific steps include:
The first temporary space and the second temporary space are opened up in advance;
It controls each default thread and operation is carried out to the corresponding data slot by the findPeaks function, and will
The maximum value of generation is recorded to first temporary space, by the corresponding maximum value index record of the maximum value to described second
Temporary space;
When completing the operation to each data slot, reduction is carried out to each maximum value in first temporary space
Operation generates global maximum, and obtains corresponding global maximum in second temporary space according to the global maximum
Value index.
7. method described in -6 any one according to claim 1, which is characterized in that pass through the data operation function described
After carrying out operation to the initial data, this method further comprises:
The result for carrying out operation to the initial data is recorded to preset result log.
8. a kind of data processing equipment characterized by comprising
Process initiation module, for starting N number for the treatment of progress using MPI;Wherein, N is the integer greater than 1, the treatment progress
In be preset with data operation function;
Data allocation module distributes each initial data to the processing of corresponding target for obtaining initial data, and respectively
Process;
Functional operation module, for being all made of OpenMP and being divided into the initial data in each target treatment progress
Data slot is simultaneously distributed to each default thread, to carry out operation to the initial data by the data operation function.
9. a kind of data processing equipment characterized by comprising
Memory, for storing computer program;
Processor realizes data processing method as described in any one of claim 1 to 7 when for executing the computer program
The step of.
10. a kind of computer readable storage medium, which is characterized in that be stored with computer on the computer readable storage medium
Program, the computer program realize data processing method as described in any one of claim 1 to 7 when being executed by processor
Step.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810803156.1A CN109062636A (en) | 2018-07-20 | 2018-07-20 | A kind of data processing method, device, equipment and medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810803156.1A CN109062636A (en) | 2018-07-20 | 2018-07-20 | A kind of data processing method, device, equipment and medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109062636A true CN109062636A (en) | 2018-12-21 |
Family
ID=64817750
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810803156.1A Pending CN109062636A (en) | 2018-07-20 | 2018-07-20 | A kind of data processing method, device, equipment and medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109062636A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110412972A (en) * | 2019-06-12 | 2019-11-05 | 广汽丰田汽车有限公司 | A kind of variable parallel communication control method, equipment and medium based on automobile |
CN110543663A (en) * | 2019-07-22 | 2019-12-06 | 西安交通大学 | Coarse-grained MPI + OpenMP hybrid parallel-oriented structural grid area division method |
CN110780038A (en) * | 2019-10-25 | 2020-02-11 | 珠海高凌信息科技股份有限公司 | Method for optimizing matching rate of original data of motor vehicle exhaust detection equipment |
CN113778518A (en) * | 2021-08-31 | 2021-12-10 | 中科曙光国际信息产业有限公司 | Data processing method, data processing device, computer equipment and storage medium |
CN115098271A (en) * | 2022-08-25 | 2022-09-23 | 北京医百科技有限公司 | Multithreading data processing method, device, equipment and medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104461466A (en) * | 2013-09-25 | 2015-03-25 | 广州中国科学院软件应用技术研究所 | Method for increasing computing speed through parallel computing based on MPI and OpenMP hybrid programming model |
CN105654554A (en) * | 2016-01-06 | 2016-06-08 | 西安电子科技大学 | Parallel computing method for infrared scattering characteristics of non-Lambert surface target |
CN105677577A (en) * | 2016-02-23 | 2016-06-15 | 中国农业银行股份有限公司 | System memory management method and device |
CN108008975A (en) * | 2017-12-22 | 2018-05-08 | 郑州云海信息技术有限公司 | A kind of processing method and processing device of the view data based on KNL platforms |
-
2018
- 2018-07-20 CN CN201810803156.1A patent/CN109062636A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104461466A (en) * | 2013-09-25 | 2015-03-25 | 广州中国科学院软件应用技术研究所 | Method for increasing computing speed through parallel computing based on MPI and OpenMP hybrid programming model |
CN105654554A (en) * | 2016-01-06 | 2016-06-08 | 西安电子科技大学 | Parallel computing method for infrared scattering characteristics of non-Lambert surface target |
CN105677577A (en) * | 2016-02-23 | 2016-06-15 | 中国农业银行股份有限公司 | System memory management method and device |
CN108008975A (en) * | 2017-12-22 | 2018-05-08 | 郑州云海信息技术有限公司 | A kind of processing method and processing device of the view data based on KNL platforms |
Non-Patent Citations (1)
Title |
---|
英特尔软件学院教材编写组.: "《英特尔平台编程》", 31 January 2011, 上海交通大学出版社., pages: 192 - 194 * |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110412972A (en) * | 2019-06-12 | 2019-11-05 | 广汽丰田汽车有限公司 | A kind of variable parallel communication control method, equipment and medium based on automobile |
CN110412972B (en) * | 2019-06-12 | 2021-04-20 | 广汽丰田汽车有限公司 | Variable parallel communication control method, equipment and medium based on automobile |
CN110543663A (en) * | 2019-07-22 | 2019-12-06 | 西安交通大学 | Coarse-grained MPI + OpenMP hybrid parallel-oriented structural grid area division method |
CN110543663B (en) * | 2019-07-22 | 2021-07-13 | 西安交通大学 | Coarse-grained MPI + OpenMP hybrid parallel-oriented structural grid area division method |
CN110780038A (en) * | 2019-10-25 | 2020-02-11 | 珠海高凌信息科技股份有限公司 | Method for optimizing matching rate of original data of motor vehicle exhaust detection equipment |
CN110780038B (en) * | 2019-10-25 | 2022-05-10 | 珠海高凌信息科技股份有限公司 | Method for optimizing matching rate of original data of motor vehicle exhaust detection equipment |
CN113778518A (en) * | 2021-08-31 | 2021-12-10 | 中科曙光国际信息产业有限公司 | Data processing method, data processing device, computer equipment and storage medium |
CN113778518B (en) * | 2021-08-31 | 2024-03-26 | 中科曙光国际信息产业有限公司 | Data processing method, device, computer equipment and storage medium |
CN115098271A (en) * | 2022-08-25 | 2022-09-23 | 北京医百科技有限公司 | Multithreading data processing method, device, equipment and medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109062636A (en) | A kind of data processing method, device, equipment and medium | |
Unat et al. | Mint: realizing CUDA performance in 3D stencil methods with annotated C | |
Leung et al. | A mapping path for multi-GPGPU accelerated computers from a portable high level programming abstraction | |
Kerr et al. | A characterization and analysis of ptx kernels | |
US20150277877A1 (en) | Compiler optimization for many integrated core processors | |
Ukidave et al. | Nupar: A benchmark suite for modern gpu architectures | |
Strengert et al. | CUDASA: Compute Unified Device and Systems Architecture. | |
Talbot et al. | Riposte: a trace-driven compiler and parallel VM for vector code in R | |
Leupers et al. | Variable partitioning for dual memory bank dsps | |
Hagiescu et al. | Automated architecture-aware mapping of streaming applications onto GPUs | |
Noaje et al. | Source-to-source code translator: OpenMP C to CUDA | |
Dong et al. | Characterizing the microarchitectural implications of a convolutional neural network (cnn) execution on gpus | |
Hou et al. | Gpu-unicache: Automatic code generation of spatial blocking for stencils on gpus | |
Metcalf | The seven ages of fortran | |
Holk et al. | Declarative parallel programming for GPUs | |
CN105511867A (en) | Optimization mode automatic generation method and optimization device | |
Palkowski et al. | TRACO: source-to-source parallelizing compiler | |
Caragea et al. | Resource-aware compiler prefetching for many-cores | |
Jin et al. | Using compiler directives for accelerating CFD applications on GPUs | |
Lin et al. | swFLOW: A dataflow deep learning framework on sunway taihulight supercomputer | |
Yan et al. | Homp: Automated distribution of parallel loops and data in highly parallel accelerator-based systems | |
Lin et al. | Compilers for low power with design patterns on embedded multicore systems | |
Cohen et al. | Split tiling for gpus: Automatic parallelization using trapezoidal tiles to reconcile parallelism and locality, avoiding divergence and load imbalance | |
Li et al. | Pragma directed shared memory centric optimizations on GPUs | |
Madsen et al. | Streaming nested data parallelism on multicores |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20181221 |
|
RJ01 | Rejection of invention patent application after publication |