CN109062636A

CN109062636A - A kind of data processing method, device, equipment and medium

Info

Publication number: CN109062636A
Application number: CN201810803156.1A
Authority: CN
Inventors: 赵旭东; 景璐; 黄雪
Original assignee: Inspur Beijing Electronic Information Industry Co Ltd
Current assignee: Inspur Beijing Electronic Information Industry Co Ltd
Priority date: 2018-07-20
Filing date: 2018-07-20
Publication date: 2018-12-21

Abstract

The invention discloses a kind of data processing method, device, equipment and medium, the step of this method includes: that MPI is utilized to start N number for the treatment of progress；Wherein, N is the integer greater than 1, is preset with data operation function in treatment progress；Initial data is obtained, and respectively distributes each initial data to corresponding target treatment progress；In each target treatment progress, it is all made of OpenMP and initial data is divided into data slot and is distributed to each default thread, to carry out operation to initial data by data operation function.This method opens up multiple treatment progress by the way of MPI and OpenMP cooperating, and the initial data that the process get is distributed into thread to parallel processing by OpenMP in each process, so as to improve the overall performance for carrying out calculation process to mass data.In addition, the present invention also provides a kind of data processing equipment, equipment and medium, beneficial effect is same as above.

Description

A kind of data processing method, device, equipment and medium

Technical field

The present invention relates to big data fields, more particularly to a kind of data processing method, device, equipment and medium.

Background technique

With the continuous development of science and technology, and for the exploration in astronomical field constantly deepen, " square kilometer array " is astronomical Telescope (SKA, Square Kilometer Array) has become the project currently having attracted much attention.In the project, for The performance of magnanimity chronometer data processing is one of the key factor that can the project effectively push.

For current astronomical image data processing work, the underlying basis that can be realized is to large nuber of images The operation of data is generally required to generate biggish time overhead, and then is acquired by operation when carrying out operation to mass data Corresponding operation result, and the data volume for carrying out calculation process is bigger, and obtained operation result is more accurate, and then astronomic graph As the overall effect of data processing is better, the clarity and availability of the astronomical image ultimately generated are higher, therefore in astronomy When image real time transfer, the performance that the operation of mass data can be provided is supported, is to determine that can astronomical image real time transfer Further key factor.

DeConvolution program is the important ring in SKA project in science data processing module, is that this module is fallen into a trap One of maximum application of calculation amount.DeConvolution program in existing ASKAP software package, is passed through using single thread OpenMP handles the distribution of computation tasks that operation process receives to multiple threads, finally to original in a manner of thread collaboration Beginning data carry out calculation process, but by individual process realization for initial data calculation process when, each process exists It is merely able to carry out calculation process to a certain number of initial data in unit time, the data volume of processing is relatively limited, therefore nothing The hardware supported that method adequately can be provided using arithmetic facility, and then in the current situation, data are carried out by arithmetic facility The overall performance of processing is relatively low.

It can be seen that a kind of data processing method is provided, to improve entirety when arithmetic facility carries out mass data processing Performance is those skilled in the art's urgent problem to be solved.

Summary of the invention

The object of the present invention is to provide a kind of data processing method, device, equipment and media, to improve arithmetic facility progress Overall performance when mass data processing.

In order to solve the above technical problems, the present invention provides a kind of data processing method, comprising:

Start N number for the treatment of progress using MPI；Wherein, N is the integer greater than 1, is preset with data operation letter in treatment progress Number；

Initial data is obtained, and respectively distributes each initial data to corresponding target treatment progress；

In each target treatment progress, it is all made of OpenMP and initial data is divided into data slot and is distributed to each default Thread, to carry out operation to initial data by data operation function.

Preferably, start N number for the treatment of progress using MPI specifically:

Start N number of deConvolution treatment progress using MPI；

Correspondingly, initial data is specially initial data array；

Correspondingly, data operation function specifically includes findPeaks function and subtractPSF function.

Preferably, start N number of deConvolution treatment progress using MPI specifically:

Started in the arithmetic facility equipped with Intel KNM processor using MPI N number of deConvolution handle into Journey.

Preferably, #pragma simd instruction is preset in findPeaks function and subtractPSF function；

Correspondingly, this method further comprises before carrying out operation to initial data array by data operation function:

Data operation function is compiled, and loads AVX512 instruction set in compiling content.

Preferably, initial data is obtained specifically:

Initial data array is obtained in MCDRAM memory；Wherein, the initial data array in MCDRAM memory passes through in advance Cross 8 byte-aligneds.

Preferably, initial data array is divided by data slot using OpenMP and distributed to each default thread, with logical Crossing the specific steps that findPeaks function carries out operation to initial data array includes:

The first temporary space and the second temporary space are opened up in advance；

It controls each default thread and operation is carried out to corresponding data slot by findPeaks function, and most by generation Big value record is to the first temporary space, by the corresponding maximum value index record of maximum value to the second temporary space；

When completing the operation to each data slot, reduction operations generation is carried out to each maximum value in the first temporary space Global maximum, and corresponding global maximum index is obtained in the second temporary space according to global maximum.

Preferably, after carrying out operation to initial data by data operation function, this method further comprises:

The result for carrying out operation to initial data is recorded to preset result log.

In addition, the present invention also provides a kind of data processing equipments, comprising:

Process initiation module, for starting N number for the treatment of progress using MPI；Wherein, N is the integer greater than 1, treatment progress In be preset with data operation function；

Data allocation module distributes each initial data to the processing of corresponding target for obtaining initial data, and respectively Process；

Functional operation module, for being all made of OpenMP for initial data and being divided into data in each target treatment progress Segment is simultaneously distributed to each default thread, to carry out operation to initial data by data operation function.

Memory, for storing computer program；

Processor is realized when for executing computer program such as the step of above-mentioned data processing method.

In addition, being stored with meter on computer readable storage medium the present invention also provides a kind of computer readable storage medium Calculation machine program is realized when computer program is executed by processor such as the step of above-mentioned data processing method.

Data processing method provided by the present invention is to start 1 or more treatment progress by MPI, and each place It is preset with data operation function and multiple threads in reason process, and then obtains initial data, and respectively by each initial data point It is assigned to corresponding target treatment progress, and is all made of OpenMP in each target treatment progress and initial data is divided into data Segment is simultaneously distributed to each preset thread, and then is realized by each thread and carried out according to data operation function to initial data jointly The effect of operation.This method is by the way of MPI and OpenMP cooperating, by opening up multiple treatment progress, each processing Process obtains corresponding initial data, and the original number for getting the process by OpenMP in each process simultaneously According to distribution parallel processing into thread, arithmetic facility is improved in contrast, initial data is carried out at operation within the unit time The overall quantity of reason, and then improve the overall performance that calculation process is carried out to mass data.In addition, the present invention also provides one kind Data processing equipment, equipment and medium, beneficial effect are same as above.

Detailed description of the invention

In order to illustrate the embodiments of the present invention more clearly, attached drawing needed in the embodiment will be done simply below It introduces, it should be apparent that, drawings in the following description are only some embodiments of the invention, for ordinary skill people For member, without creative efforts, it is also possible to obtain other drawings based on these drawings.

Fig. 1 is a kind of flow chart of data processing method provided in an embodiment of the present invention；

Fig. 2 is a kind of structure chart of data processing equipment provided in an embodiment of the present invention.

Specific embodiment

Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, rather than whole embodiments.Based on this Embodiment in invention, those of ordinary skill in the art are without making creative work, obtained every other Embodiment belongs to the scope of the present invention.

Core of the invention is to provide a kind of data processing method, with improve arithmetic facility carry out mass data processing when Overall performance.Another core of the invention is to provide a kind of data processing equipment, equipment and medium.

In order to enable those skilled in the art to better understand the solution of the present invention, with reference to the accompanying drawings and detailed description The present invention is described in further detail.

Embodiment one

Fig. 1 is a kind of flow chart of data processing method provided in an embodiment of the present invention.Referring to FIG. 1, data processing side The specific steps of method include:

Step S10: start N number for the treatment of progress using MPI.

Wherein, N is the integer greater than 1, is preset with data operation function in treatment progress.

It should be noted that MPI (Message Passing Interface, message passing interface), is one across language Communications protocol support point-to-point and broadcast for writing parallel computer.MPI is an information transmitting application programming interfaces, Including agreement and semantic description, they indicate how it plays its characteristic in various implementations, and the target of MPI is high-performance, greatly Scale, and it is portable, MPI is in the main models that today is still high-performance calculation.Since MPI can make in a specific group All processes all participate in global data processing and traffic operation, and all processes are agreed on specific point, therefore For this step after opening N number for the treatment of progress by way of calling the interface of MPI, which can carry out data jointly Handle work.In addition, it is necessary to which the quantity of i.e. treatment progress should be greater than 1, it is emphasized that the N in this step is the integer greater than 1 It is a.

Step S11: initial data is obtained, and respectively distributes each initial data to corresponding target treatment progress.

It should be noted that initial data acquired in this step is data to be processed, it is in this step, to be processed Data be assigned into corresponding target treatment progress.It should be noted that being between initial data and target treatment progress There are corresponding relationship, i.e., a certain initial data only can carry out relevant data fortune by corresponding target treatment progress It calculates.The concrete type and content of initial data should be not specifically limited herein depending on actual usage scenario.

Step S12: it in each target treatment progress, is all made of OpenMP and initial data is divided into data slot and is distributed To each default thread, to carry out operation to initial data by data operation function.

It should be noted that OpenMP provides the high-rise abstractdesription to parallel algorithm, programmer passed through in source generation Dedicated #pragma is added to indicate the intention of oneself in code, and then since program can be carried out parallelization automatically by compiler, And synchronization and mutex and communication are added in place of necessity, therefore in this step, initial data is divided by number by OpenMP It according to segment and distributes in each default thread into target treatment progress, and then by each default thread according to data operation function Concurrent operation is carried out to the data slot of initial data, the parallel processing to initial data is realized with this, ensure that at each target The efficiency that reason process handles initial data.

Data processing method provided by the present invention is to start 1 or more treatment progress by MPI, and each place It is preset with data operation function and multiple threads in reason process, and then obtains initial data, and respectively by each initial data point It is assigned to corresponding target treatment progress, and is all made of OpenMP in each target treatment progress and initial data is divided into data Segment is simultaneously distributed to each preset thread, and then is realized by each thread and carried out according to data operation function to initial data jointly The effect of operation.This method is by the way of MPI and OpenMP cooperating, by opening up multiple treatment progress, each processing Process obtains corresponding initial data, and the original number for getting the process by OpenMP in each process simultaneously According to distribution parallel processing into thread, arithmetic facility is improved in contrast, initial data is carried out at operation within the unit time The overall quantity of reason, and then improve the overall performance that calculation process is carried out to mass data.

Embodiment two

On the basis of the above embodiments, the present invention also provides a series of preferred embodiments.

As a preferred embodiment, starting N number for the treatment of progress using MPI specifically:

Start N number of deConvolution treatment progress using MPI；

Correspondingly, initial data is specially initial data array；

Present embodiment is suitable at the data under SKA (" square kilometer array " astronomical telescope) and similar scene Reason.It should be noted that due to science data processing (SDP, Science Data Processing) be in SKA project most Key link, and the algorithm being related in SDP has deGridding, deConvolution, FFT and Projection etc., And the calculation amount of deConvolution (image deconvolution) algorithm accounts for 20% or so of SDP the amount of calculation, is relatively heavy in SDP The operation content wanted, therefore in order to improve the efficiency based on deConvolution algorithm process data, it is utilized in present embodiment The treatment progress of MPI starting, it is therefore an objective to by handling multiple series of images data simultaneously, make full use of processor calculated performance and set Standby bandwidth.

FindPeaks function is used to find the maximum value and maximum value index position in astronomical image array, FindPeaks function is specifically to obtain maximum value and its corresponding index position in array according to the method for traversal comparison；And SubtractPSF function is to find out the part for needing to update in image according to the maximum value of image array and the index of maximum value, And then image data is updated, it is specially calculated according to the maximum value index in image array, calculates needs more New image-region, is later updated image data.Since findPeaks function and subtractPSF function are these Operation function well known to the technical staff of field, thus herein not to findPeaks function and subtractPSF function it is specific in Appearance is repeated.In the present embodiment, using parallel deConvolution treatment progress and each deConvolution Parallel default thread handles the corresponding initial data array of astronomical image in treatment progress, opposite to improve execution The whole efficiency of deConvolution algorithm.

On the basis of the above embodiment, as a preferred embodiment, being started using MPI N number of DeConvolution treatment progress specifically:

It should be noted that Intel KNM (Knights Mill) processor is a many-core processing of Intel publication Device, KNM processor possesses a large amount of logic core, but since the dominant frequency of single kernel is lower, monokaryon performance is weaker, when When processing is than relatively time-consuming serial code, performance tends not to highly desirable program, and present embodiment is handled by KNM Device executes parallel deConvolution treatment progress, therefore each monokaryon can access reasonable benefit in KNM processor With the advantage that KNM processor has numerous monokaryons is utilized to the greatest extent, further improves whole data processing effect Rate.

In addition, on the basis of the above embodiment, as a preferred embodiment, findPeaks function and #pragma simd instruction is preset in subtractPSF function；

Due to all referring to the loop iteration of operation content in findPeak function and subtractPSF function, in order to Compiler can be instructed to be able to carry out vectorization operation when encountering circulation, to be obviously improved the execution efficiency of above-mentioned function, " #pragma simd " pre-processing instruction can be added in the front of for Do statement, while compiling option being added in compiling "-xCOMMON-AVX512 " loads AVX512 instruction set in compiling content, can give full play to KNM using this scheme On AVX512 instruction set, the performance boost of program is made by height vector.

In addition, on the basis of the above embodiment, as a preferred embodiment, it is specific to obtain initial data Are as follows:

It should be noted that MCDRAM memory (Multi-Channel Dynamic Random Access Memory, it is more Channel dynamic random access memory) it is equivalent to and is provided with multiple Memory Controller Hub in memory chip, between each Memory Controller Hub It can work independently from each other, each Memory Controller Hub controls a corresponding main memory access, and then the bandwidth of MCDRAM memory With the bandwidth and data reading speed for being multiple times than single channel memory, thus it is higher for the treatment effeciency of data.In addition, Initial data array in MCDRAM memory first passes through 8 byte-aligneds in advance, moves between memory with facilitating data efficient.It is right For KNM processor, when data initial address is located at 8 byte boundary, internal storage data is mobile to can reach optimum state, raising pair In the whole efficiency of data acquisition.In order to help compiler to carry out vector quantization, need to carry out memory by the way of 8 byte-aligneds Distribution, and inform that compiler internal storage access has been aligned using pragma/instruction.In code realization, for alignment Heap Allocation Data, can be used " _ mm_malloc " order and " _ mm_free " order carrys out storage allocation array, additionally need insertion " # Pragma vector aligned " clause, to inform that all arrays accessed in compiler particular cycle have been aligned.

Furthermore it is also possible to prefetch the benefit that means such as (Memory Prefetch) improve memory access efficiency and cache using memory With rate.In the core innermost loop of deConvolution algorithm, it can be prefetched by calling " _ mm_prefetch " function Array of data, is stored in cache in advance, avoids and calculated by the array of data used required for several subsequent calculating of step The occurrence of data cache misses are read in journey, to improve the memory access efficiency to data.

In addition, on the basis of the above embodiment, as a preferred embodiment, using OpenMP by original number Data slot is divided into according to array and is distributed to each default thread, to be carried out by findPeaks function to initial data array The specific steps of operation include:

It should be noted that the mistake of findPeaks functional operation is carried out to data slot by way of multi-threaded parallel Cheng Zhong, each thread can generate the maximum value of a current data segment, and then need between each maximum value further Reduction operations to obtain the global maximum of initial data array.It is used in existing deConvolution algorithm routine Critical clause realizes to the reduction operations between the maximum value of thread each in process, but is carried out by Critical clause When reduction operations, need the maximum value of operation that can cause the maximum value can not be in the mistake of this reduction operations by preparatory " locking " It is called again in journey, therefore current mode increases the time of thread waiting, executing efficiency is lower, is findPeak The bottleneck of function calculated performance.For this problem, corresponding temporary space can be opened up to each process, such as open up " temp_ Peak " and " temp_Pos " array.It controls each default thread and corresponding data slot is transported by findPeaks function It calculates, and the maximum value of generation is recorded into temp_Peak array, by the corresponding maximum value index record of maximum value to the In temp_Pos array, and then when the operation of the complete paired data segment of each thread, to each maximum value of temp_Peak array into Row reduction operations generate global maximum, and corresponding global maximum is obtained in temp_Pos array according to global maximum Index.By present embodiment, avoids and occur the case where maximum value is by " locking " in calculating process, improve whole fortune Calculate efficiency.

In addition, as a preferred embodiment, being somebody's turn to do after carrying out operation to initial data by data operation function Method further comprises:

It is understood that the result for carrying out operation to initial data is recorded to preset result log, Yong Huke With relative efficiency according to demand got in result log operation as a result, convenient for operation result carry out subsequent analysis.

Embodiment three

Hereinbefore the embodiment of data processing method is described in detail, the present invention also provides one kind and is somebody's turn to do The corresponding data processing equipment of method, since the embodiment of device part is corresponded to each other with the embodiment of method part, dress Set part embodiment refer to method part embodiment description, wouldn't repeat here.

Fig. 2 is a kind of structure chart of data processing equipment provided in an embodiment of the present invention.Number provided in an embodiment of the present invention According to processing unit, comprising:

Process initiation module 10, for starting N number for the treatment of progress using MPI；Wherein, N is integer greater than 1, handle into Data operation function is preset in journey.

Data allocation module 11 distributes each initial data to corresponding target for obtaining initial data, and respectively Reason process.

Functional operation module 12, for being all made of OpenMP for initial data and being divided into number in each target treatment progress It according to segment and distributes to each default thread, to carry out operation to initial data by data operation function.

Data processing equipment provided by the present invention is to start 1 or more treatment progress by MPI, and each place It is preset with data operation function and multiple threads in reason process, and then obtains initial data, and respectively by each initial data point It is assigned to corresponding target treatment progress, and is all made of OpenMP in each target treatment progress and initial data is divided into data Segment is simultaneously distributed to each preset thread, and then is realized by each thread and carried out according to data operation function to initial data jointly The effect of operation.The present apparatus is by the way of MPI and OpenMP cooperating, by opening up multiple treatment progress, each processing Process obtains corresponding initial data, and the original number for getting the process by OpenMP in each process simultaneously According to distribution parallel processing into thread, arithmetic facility is improved in contrast, initial data is carried out at operation within the unit time The overall quantity of reason, and then improve the overall performance that calculation process is carried out to mass data.

Example IV

The present invention also provides a kind of data processing equipments, comprising:

Memory, for storing computer program；

Data processing equipment provided by the present invention is to start 1 or more treatment progress by MPI, and each place It is preset with data operation function and multiple threads in reason process, and then obtains initial data, and respectively by each initial data point It is assigned to corresponding target treatment progress, and is all made of OpenMP in each target treatment progress and initial data is divided into data Segment is simultaneously distributed to each preset thread, and then is realized by each thread and carried out according to data operation function to initial data jointly The effect of operation.This equipment is by the way of MPI and OpenMP cooperating, by opening up multiple treatment progress, each processing Process obtains corresponding initial data, and the original number for getting the process by OpenMP in each process simultaneously According to distribution parallel processing into thread, arithmetic facility is improved in contrast, initial data is carried out at operation within the unit time The overall quantity of reason, and then improve the overall performance that calculation process is carried out to mass data.

The present invention also provides a kind of computer readable storage medium, computer journey is stored on computer readable storage medium Sequence is realized when computer program is executed by processor such as the step of above-mentioned data processing method.

Computer readable storage medium provided by the present invention is the treatment progress by MPI starting 1 or more, and It is preset with data operation function and multiple threads in each treatment progress, and then obtains initial data, and respectively will be each original Data are distributed to corresponding target treatment progress, and are all made of OpenMP in each target treatment progress and are divided initial data It for data slot and distributes to each preset thread, and then is realized jointly according to data operation function by each thread to original number According to the effect for carrying out operation.This computer readable storage medium is more by opening up by the way of MPI and OpenMP cooperating A treatment progress, each treatment progress obtain corresponding initial data simultaneously, and will by OpenMP in each process The initial data that the process is got distributes into thread parallel processing, improves arithmetic facility in contrast within the unit time The overall quantity of calculation process is carried out to initial data, and then improves the overall performance that calculation process is carried out to mass data.

A kind of data processing method provided by the present invention, device, equipment and medium are described in detail above.It says Each embodiment is described in a progressive manner in bright book, and the highlights of each of the examples are the differences with other embodiments Place, the same or similar parts in each embodiment may refer to each other.For the device disclosed in the embodiment, due to its with Method disclosed in embodiment is corresponding, so being described relatively simple, reference may be made to the description of the method.It should refer to It out, for those skilled in the art, without departing from the principle of the present invention, can also be to the present invention Some improvement and modification can also be carried out, and these improvements and modifications also fall within the scope of protection of the claims of the present invention.

It should also be noted that, in the present specification, relational terms such as first and second and the like be used merely to by One entity or operation are distinguished with another entity or operation, without necessarily requiring or implying these entities or operation Between there are any actual relationship or orders.Moreover, the terms "include", "comprise" or its any other variant meaning Covering non-exclusive inclusion, so that the process, method, article or equipment for including a series of elements not only includes that A little elements, but also including other elements that are not explicitly listed, or further include for this process, method, article or The intrinsic element of equipment.In the absence of more restrictions, the element limited by sentence "including a ...", is not arranged Except there is also other identical elements in the process, method, article or apparatus that includes the element.

Claims

1. a kind of data processing method characterized by comprising

Start N number for the treatment of progress using MPI；Wherein, N is the integer greater than 1, is preset with data operation letter in the treatment progress Number；

In each target treatment progress, it is all made of OpenMP and the initial data is divided into data slot and is distributed to each Default thread, to carry out operation to the initial data by the data operation function.

2. the method according to claim 1, wherein described start N number for the treatment of progress using MPI specifically:

Start N number of deConvolution treatment progress using the MPI；

Correspondingly, the initial data is specially initial data array；

Correspondingly, the data operation function specifically includes findPeaks function and subtractPSF function.

3. according to the method described in claim 2, it is characterized in that, described start N number of deConvolution using the MPI Treatment progress specifically:

Started at N number of deConvolution in the arithmetic facility equipped with Intel KNM processor using the MPI Reason process.

4. according to the method described in claim 3, it is characterized in that, the findPeaks function and the subtractPSF #pragma simd instruction is preset in function；

Correspondingly, it is described operation is carried out to the initial data array by the data operation function before, this method is into one Step includes:

The data operation function is compiled, and loads AVX512 instruction set in compiling content.

5. according to the method described in claim 4, it is characterized in that, the acquisition initial data specifically:

The initial data array is obtained in MCDRAM memory；Wherein, the initial data number in the MCDRAM memory Group first passes through 8 byte-aligneds in advance.

6. according to the method described in claim 2, it is characterized in that, the initial data array is divided into number using OpenMP It according to segment and distributes to each default thread, to carry out operation to the initial data array by the findPeaks function Specific steps include:

It controls each default thread and operation is carried out to the corresponding data slot by the findPeaks function, and will The maximum value of generation is recorded to first temporary space, by the corresponding maximum value index record of the maximum value to described second Temporary space；

When completing the operation to each data slot, reduction is carried out to each maximum value in first temporary space Operation generates global maximum, and obtains corresponding global maximum in second temporary space according to the global maximum Value index.

7. method described in -6 any one according to claim 1, which is characterized in that pass through the data operation function described After carrying out operation to the initial data, this method further comprises:

The result for carrying out operation to the initial data is recorded to preset result log.

8. a kind of data processing equipment characterized by comprising

Process initiation module, for starting N number for the treatment of progress using MPI；Wherein, N is the integer greater than 1, the treatment progress In be preset with data operation function；

Functional operation module, for being all made of OpenMP and being divided into the initial data in each target treatment progress Data slot is simultaneously distributed to each default thread, to carry out operation to the initial data by the data operation function.

9. a kind of data processing equipment characterized by comprising

Memory, for storing computer program；

Processor realizes data processing method as described in any one of claim 1 to 7 when for executing the computer program The step of.

10. a kind of computer readable storage medium, which is characterized in that be stored with computer on the computer readable storage medium Program, the computer program realize data processing method as described in any one of claim 1 to 7 when being executed by processor Step.