CN117194041B

CN117194041B - Parallel optimization method and system for high-performance computer

Info

Publication number: CN117194041B
Application number: CN202311245601.4A
Authority: CN
Inventors: 蒋丽娟; 陈洪波; 贾世奇
Original assignee: Beijing Qiangyun Innovation Technology Co ltd
Current assignee: Beijing Qiangyun Innovation Technology Co ltd
Priority date: 2023-09-26
Filing date: 2023-09-26
Publication date: 2024-03-19
Anticipated expiration: 2043-09-26
Also published as: CN117194041A

Abstract

The invention discloses a parallel optimization method and a system for a high-performance computer, which are applied to the technical field of data processing, wherein the method comprises the following steps: and obtaining m storage modules according to the computer structure. And obtaining the calculation characteristics of the m storage modules for division, and obtaining the classification results of the storage modules. And receiving a plurality of computing processes, matching the computing processes with categories in the classification result of the storage module, and outputting a plurality of computing processes. And each class of computing process corresponds to the storage module to establish a parallel process queue, and N process sequence lists are output. And correspondingly establishing N data preprocessing units according to the N process sequence lists, and connecting with each type of storage module. And acquiring parallel computing processes of any process sequence table, preprocessing data, and then reassigning the data to corresponding types of storage modules, and issuing instructions by a control memory for data processing. The method solves the technical problems of low calculation efficiency and high failure rate of parallel process calculation in the parallel processing method in the prior art.

Description

Parallel optimization method and system for high-performance computer

Technical Field

The present invention relates to the field of data processing, and in particular, to a parallel optimization method and system for a high performance computer.

Background

Parallel processing is a computing manner in a computer system that can perform multiple processes simultaneously, thereby saving the processing time of complex processing procedures. However, in the parallel processing method in the prior art, the delay of the thread processing is not synchronous, and then the processing of the next process is performed, so that the overall computing efficiency is affected, and the problem of high process computing failure rate exists.

Therefore, the parallel processing method in the prior art still has the technical problems of low calculation efficiency and high calculation failure rate of parallel processes.

Disclosure of Invention

The parallel optimization method and the system for the high-performance computer solve the technical problems that in the prior art, the parallel processing method still has low calculation efficiency and the calculation failure rate of parallel processes is high.

The application provides a parallel optimization method for a high-performance computer, which comprises the following steps: obtaining m storage modules according to a computer structure, wherein the m storage modules are connected with a control memory through a data bus; acquiring calculation characteristics of the m storage modules, dividing the m storage modules according to the calculation characteristics, and acquiring a storage module classification result; receiving a plurality of computing processes, matching the computing processes with categories in the classification result of the storage module according to the data types of the computing processes, and outputting a plurality of types of computing processes; establishing parallel process queues corresponding to each type of computing process and the storage module of the matched type, and outputting N process sequence tables; correspondingly establishing N data preprocessing units according to the data types of the N process sequence tables, and respectively connecting the N data preprocessing units with each type of storage module; and acquiring a parallel computing process of any process sequence table, inputting the data of the parallel computing process into a corresponding data preprocessing unit for data preprocessing, and then reassigning the data to a corresponding class of storage module for data processing by an instruction issued by the control memory.

The present application also provides a parallel optimization system for a high performance computer, the system comprising: the structure acquisition module is used for acquiring m storage modules according to a computer structure, wherein the m storage modules are connected with the control memory through a data bus; the classification result acquisition module is used for acquiring the calculation characteristics of the m storage modules, dividing the m storage modules according to the calculation characteristics and acquiring the classification results of the storage modules; the class matching module is used for receiving a plurality of computing processes, matching the class in the classification result of the storage module according to the data types of the computing processes, and outputting a plurality of classes of computing processes; the sequence table construction module is used for correspondingly establishing a parallel process queue by using each type of computing process and the storage module of the matched type and outputting N process sequence tables; the preprocessing module is used for correspondingly establishing N data preprocessing units according to the data types of the N process sequence tables, and respectively connecting the N data preprocessing units with each type of storage module; the data processing module is used for acquiring a parallel computing process of any process sequence table, inputting the data of the parallel computing process into a corresponding data preprocessing unit for data preprocessing, and then reassigning the data to a corresponding class of storage module for data processing by an instruction issued by the control memory.

The application also provides an electronic device, comprising:

a memory for storing executable instructions;

and the processor is used for realizing the parallel optimization method for the high-performance computer when executing the executable instructions stored in the memory.

The present application provides a computer readable storage medium storing a computer program which, when executed by a processor, implements a parallel optimization method for a high performance computer provided by the present application.

According to the parallel optimization method and system for the high-performance computer, m storage modules are acquired according to a computer structure. And obtaining the calculation characteristics of the m storage modules for division, and obtaining the classification results of the storage modules. And receiving a plurality of computing processes, matching the computing processes with categories in the classification result of the storage module, and outputting a plurality of computing processes. And each class of computing process corresponds to the storage module to establish a parallel process queue, and N process sequence lists are output. And correspondingly establishing N data preprocessing units according to the N process sequence lists, and connecting with each type of storage module. And acquiring parallel computing processes of any process sequence table, preprocessing data, and then reassigning the data to corresponding types of storage modules, and issuing instructions by a control memory for data processing. By optimizing the process processing in various aspects such as data processing category, calculation delay and storage space, the processing efficiency of the process is improved, and the failure rate of the process calculation is reduced. The method solves the technical problems of low calculation efficiency and high failure rate of parallel process calculation in the parallel processing method in the prior art.

The foregoing description is only an overview of the technical solutions of the present application, and may be implemented according to the content of the specification in order to make the technical means of the present application more clearly understood, and in order to make the above-mentioned and other objects, features and advantages of the present application more clearly understood, the following detailed description of the present application will be given.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings of the embodiments of the present disclosure will be briefly described below. It is apparent that the figures in the following description relate only to some embodiments of the present disclosure and are not limiting of the present disclosure.

FIG. 1 is a schematic flow chart of a parallel optimization method for a high performance computer according to an embodiment of the present application;

FIG. 2 is a schematic flow chart of spatial constraint on a process sequence table by a parallel optimization method for a high-performance computer according to an embodiment of the present application;

FIG. 3 is a schematic flow chart of delay compensation optimization performed by a parallel optimization method for a high-performance computer according to an embodiment of the present application;

FIG. 4 is a schematic diagram of a system for parallel optimization of a high performance computer according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of a system electronic device for a parallel optimization method for a high-performance computer according to an embodiment of the present invention.

Reference numerals illustrate: the device comprises a structure acquisition module 11, a classification result acquisition module 12, a category matching module 13, a sequence table construction module 14, a preprocessing module 15, a data processing module 16, a processor 31, a memory 32, an input device 33 and an output device 34.

Detailed Description

Example 1

For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the present application will be described in further detail with reference to the accompanying drawings, and the described embodiments should not be construed as limiting the present application, and all other embodiments obtained by those skilled in the art without making any inventive effort are within the scope of the present application.

In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is to be understood that "some embodiments" can be the same subset or different subsets of all possible embodiments and can be combined with one another without conflict.

In the following description, the terms "first", "second", "third" and the like are merely used to distinguish similar objects and do not represent a particular ordering of the objects, it being understood that the "first", "second", "third" may be interchanged with a particular order or sequence, as permitted, to enable embodiments of the application described herein to be practiced otherwise than as illustrated or described herein.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the present application only.

While the present application makes various references to certain modules in a system according to embodiments of the present application, any number of different modules may be used and run on a user terminal and/or server, the modules are merely illustrative, and different aspects of the system and method may use different modules.

A flowchart is used in this application to describe the operations performed by a system according to embodiments of the present application. It should be understood that the preceding or following operations are not necessarily performed in order precisely. Rather, the various steps may be processed in reverse order or simultaneously, as desired. Also, other operations may be added to or removed from these processes.

As shown in fig. 1, an embodiment of the present application provides a parallel optimization method for a high performance computer, where the method includes:

obtaining m storage modules according to a computer structure, wherein the m storage modules are connected with a control memory through a data bus;

acquiring calculation characteristics of the m storage modules, dividing the m storage modules according to the calculation characteristics, and acquiring a storage module classification result;

receiving a plurality of computing processes, matching the computing processes with categories in the classification result of the storage module according to the data types of the computing processes, and outputting a plurality of types of computing processes;

parallel processing is a computing manner in a computer system that can perform multiple processes simultaneously, thereby saving processing time for complex processing procedures. However, in the prior art, since the delay of processing of each thread is not synchronous, the processing of the next process is affected, and thus the calculation efficiency is reduced. And obtaining m storage modules according to the computer structure, wherein the m storage modules are connected with the control memory through a data bus. Acquiring computing features of the m storage modules, wherein the computing features of the storage modules divide the m storage modules according to computing features when data in a control memory is called for computing, and features/field processing features of data read-write, such as computing features of memory processing data specially used for storing image data, and the like, and acquire storage module classification results, and preferred storage data types corresponding to different storage module classification types are different, including: a storage module category for storing images, a storage module category for storing data, and the like. Further, a plurality of computing processes are received, the computing processes are matched with categories in the classification result of the storage module according to the data types of the computing processes, the computing processes of multiple categories are output, the efficiency of parallel computing of data is reduced due to different data storage formats, and the storage module classification results are matched according to the types of the process data by classifying the storage modules, so that the unification of the storage formats is guaranteed, and the processing efficiency of the process data is improved conveniently.

As shown in fig. 2, the method provided in the embodiment of the present application further includes:

determining the real-time storage space of each storage module according to the running state of each storage module in the m storage modules;

generating a parallel storage space which is used for parallel processing by the real-time storage space;

and carrying out space constraint on the N process sequence lists according to the parallel storage space, wherein each process sequence list in the N process sequences corresponds to one parallel storage space.

And determining the real-time storage space of each storage module according to the running state of each storage module in the m storage modules. And then, generating a parallel storage space for identifying available parallel processing by the real-time storage space, wherein the parallel storage space is the difference value between the total storage space and the real-time storage space. Further, space constraint is performed on the N process sequence lists according to the parallel storage space, namely, the constraint is performed on the parallel storage space of the program list according to the parallel storage space, wherein each process sequence list in the N process sequences corresponds to one parallel storage space.

The method provided by the embodiment of the application further comprises the following steps:

obtaining the storage module classification result, wherein the storage module types in the storage module classification result comprise N types of storage modules, and the number of the storage modules in each type is at least 2;

and acquiring real-time storage spaces of all modules in each type of storage module according to the N types of storage modules, identifying the real-time storage spaces of all modules in each type of storage module, and outputting N parallel storage spaces, wherein each parallel storage space corresponds to the storage space with the minimum residual storage space in one type of storage module.

And obtaining a storage module classification result, wherein the storage module types in the storage module classification result comprise N types of storage modules, the number corresponding to each type of storage module is at least 2, and N is the classification category of the specific storage module. And then, acquiring real-time storage spaces of all modules in each type of storage module according to the N types of storage modules, identifying the real-time storage spaces of all modules in each type of storage module, and outputting N parallel storage spaces, wherein each parallel storage space corresponds to the storage space with the smallest residual storage space in one type of storage module, so that the condition that the process calculation failure rate is increased due to overload is avoided.

performing uniformity recognition on the real-time storage spaces of all modules in each type of storage module to obtain N uniformity;

and judging the N uniformity to obtain an identification type storage module smaller than the preset uniformity, screening extremum values in the identification type storage module, identifying the screened residual storage space, and outputting N parallel storage spaces.

And carrying out uniformity recognition on the real-time storage spaces of all the modules in each type of storage module to obtain N uniformity, wherein N in the N uniformity is the number of the storage modules in each type of storage module. And when uniformity is identified, N uniformity is obtained by obtaining the average value of the real-time storage spaces of all the modules in the various storage modules and calculating the difference value between the real-time storage spaces of the modules and the average value. And judging the N uniformity to obtain an identification type storage module smaller than the preset uniformity, wherein the preset uniformity is a preset minimum uniformity threshold value, when the preset uniformity is smaller than the threshold value, the storage data of the corresponding storage module is abnormally small, and the storage module is possibly abnormal, and at the moment, extreme value screening is carried out on the identification type storage module to screen out the corresponding storage module. And identifying the screened residual storage space, and outputting N parallel storage spaces, wherein N in the N parallel storage spaces is the number of the residual storage modules after the corresponding storage modules are screened.

Establishing parallel process queues corresponding to each type of computing process and the storage module of the matched type, and outputting N process sequence tables;

correspondingly establishing N data preprocessing units according to the data types of the N process sequence tables, and respectively connecting the N data preprocessing units with each type of storage module;

and acquiring a parallel computing process of any process sequence table, inputting the data of the parallel computing process into a corresponding data preprocessing unit for data preprocessing, and then reassigning the data to a corresponding class of storage module for data processing by an instruction issued by the control memory.

And establishing parallel process queues corresponding to the computing processes of each type and the storage modules of the matched type, and outputting N process sequence lists, wherein N in the N process sequence lists is the type number of the storage modules. And correspondingly establishing N data preprocessing units according to the data types of the N process sequence lists, wherein each data preprocessing unit is used for acquiring the data of the corresponding process sequence list and finishing preprocessing before data processing, and the N data preprocessing units are respectively connected with each type of storage module. And finally, acquiring a parallel computing process of any process sequence table, inputting the data of the parallel computing process into a corresponding data preprocessing unit for data preprocessing, and then reassigning the data to a corresponding class of storage module for data processing by an instruction issued by the control memory, thereby completing the execution uniformity of the parallel computing process. By optimizing the process processing in various aspects such as data processing category, calculation delay and storage space, the processing efficiency of the process is improved, and the failure rate of the process calculation is reduced.

As shown in fig. 3, the method provided in the embodiment of the present application further includes:

acquiring data preprocessing delay, wherein the data preprocessing delay comprises a first data preprocessing delay and a second data preprocessing delay;

the first data preprocessing delay is delay of inputting the data of the parallel computing process into a corresponding data preprocessing unit for performing a data preprocessing process, and the second data preprocessing delay is delay of distributing the data into a corresponding class of storage modules for performing a data processing process by an instruction issued by the control memory;

and performing delay compensation optimization according to the first data preprocessing delay and the second data preprocessing delay.

A data preprocessing delay is acquired, the data preprocessing delay including a first data preprocessing delay and a second data preprocessing delay. The first data preprocessing delay is delay of inputting the data of the parallel computing process into a corresponding data preprocessing unit for performing a data preprocessing process, namely, processing time of the data preprocessing process in the preprocessing unit is acquired, and the second data preprocessing delay is delay of distributing the data preprocessing process to a corresponding type of storage module and giving an instruction by the control memory, namely, processing time after preprocessing is acquired. And performing delay compensation optimization according to the first data preprocessing delay and the second data preprocessing delay, so as to obtain more accurate process calculation delay, and further eliminate process calculation errors caused by the delay.

acquiring running data of the parallel computing process during computing, and acquiring parallel processing delay and asynchronous processing delay of the next process by using the running data;

establishing a delay loss function according to the data preprocessing delay, the parallel processing delay and the asynchronous processing delay;

and using the minimum loss function as a target condition, and using the control parameters of the command issued by the control memory as response to perform delay compensation optimization.

Acquiring running data of the parallel computing process during calculation, acquiring parallel processing delay and asynchronous processing delay of the next process according to the running data, namely acquiring historical running data of the parallel computing process according to the parallel computing process, and then acquiring processing delay, namely processing time, of the parallel computing process in the historical running data and updating asynchronous processing time of the next process. And establishing a delay loss function according to the data preprocessing delay, the parallel processing delay and the asynchronous processing delay, wherein the delay loss function is established by summing the data preprocessing delay, the parallel processing delay and the asynchronous processing delay to acquire the delay in the processing process of the parallel computing process. And finally, using the minimum loss function as a target condition, namely the delay in the processing process of the parallel computing process, and using the control parameters of the instruction issued by the control memory as response to perform delay compensation optimization, thereby realizing the delay compensation optimization of the processing process of the parallel computing process.

acquiring the computation complexity of the N process sequence lists, identifying the process sequence lists with the computation complexity greater than the preset computation complexity, and outputting an identification result;

wherein, the expression of the computational complexity is:

wherein t (n) is the computational complexity, n is the total number of processes in the corresponding process sequence table, p is the number of processes in the corresponding process sequence table for single parallel processing, [0, p]The processes in the process table are processed in parallel by the memory modules connected with the corresponding process sequence table, [ p+1, n ]]Requiring a sequential processing by a memory module,for the computational complexity of each process, 2 ^i-1ogp The number of processes to be executed sequentially;

and correspondingly establishing a data preprocessing unit corresponding to each identification process sequence table according to the identification result.

Establishing N data preprocessing units correspondingly according to the data types of the N process sequence tables, wherein the method further comprises the following steps: the N process sequence lists are obtained, the process sequence list with the calculation complexity larger than the preset calculation complexity is identified, an identification result is output, and when the calculation complexity of the identification result is large, a corresponding preprocessing unit is needed to be established. The identification result has small calculation complexity and can be built without a preprocessing unit. Wherein, the expression of the computational complexity is:

wherein t (n) is the computational complexity, n is the total number of processes in the corresponding process sequence table, p is the number of processes in the corresponding process sequence table for single parallel processing, [0, p]The processes in the process table are processed in parallel by the memory modules connected with the corresponding process sequence table, [ p+1, n ]]Requiring a sequential processing by a memory module,for the computational complexity of each process, 2 ^i-1ogp For processes performed sequentiallyA number. And finally, correspondingly establishing a data preprocessing unit corresponding to each identification process sequence table according to the identification result.

According to the technical scheme provided by the embodiment of the invention, m storage modules are acquired according to a computer structure, wherein the m storage modules are connected with a control memory through a data bus. And obtaining the calculation characteristics of the m storage modules, dividing the m storage modules according to the calculation characteristics, and obtaining the classification results of the storage modules. And receiving a plurality of computing processes, matching the computing processes with the categories in the classification result of the storage module according to the data types of the computing processes, and outputting a plurality of types of computing processes. And establishing parallel process queues corresponding to the computing processes of each class and the storage modules of the matched class, and outputting N process sequence tables. And correspondingly establishing N data preprocessing units according to the data types of the N process sequence tables, and respectively connecting the N data preprocessing units with each type of storage module. And acquiring a parallel computing process of any process sequence table, inputting the data of the parallel computing process into a corresponding data preprocessing unit for data preprocessing, and then reassigning the data to a corresponding class of storage module for data processing by an instruction issued by the control memory. By optimizing the process processing in various aspects such as data processing category, calculation delay and storage space, the processing efficiency of the process is improved, and the failure rate of the process calculation is reduced. The method solves the technical problems of low calculation efficiency and high failure rate of parallel process calculation in the parallel processing method in the prior art.

Example two

Based on the same inventive concept as the parallel optimization method for a high-performance computer in the foregoing embodiments, the present invention also provides a system for a parallel optimization method for a high-performance computer, which may be implemented by hardware and/or software, and may be generally integrated in an electronic device, for performing the method provided by any of the embodiments of the present invention. As shown in fig. 4, the system includes:

the structure acquisition module 11 is used for acquiring m storage modules according to a computer structure, wherein the m storage modules are connected with the control memory through a data bus;

the classification result obtaining module 12 is configured to obtain calculation features of the m storage modules, divide the m storage modules according to the calculation features, and obtain a classification result of the storage modules;

the class matching module 13 is configured to receive a plurality of computing processes, match the class in the classification result of the storage module according to the data types of the plurality of computing processes, and output a plurality of computing processes;

the sequence table construction module 14 is configured to correspondingly establish a parallel process queue with each type of computing process and the storage module of the matching type, and output N process sequence tables;

the preprocessing module 15 is used for correspondingly establishing N data preprocessing units according to the data types of the N process sequence tables, and respectively connecting the N data preprocessing units with each type of storage module;

the data processing module 16 is configured to obtain a parallel computing process of any process sequence table, input data of the parallel computing process into a corresponding data preprocessing unit for data preprocessing, and reassign the data to a corresponding class of storage module for data processing by the control memory issuing an instruction.

Further, the classification result obtaining module 12 is further configured to:

Further, the data processing module 16 is further configured to:

Further, the preprocessing module 15 is further configured to:

wherein, the expression of the computational complexity is:

The included units and modules are only divided according to the functional logic, but are not limited to the above-mentioned division, so long as the corresponding functions can be realized; in addition, the specific names of the functional units are also only for distinguishing from each other, and are not used to limit the protection scope of the present invention.

Example III

Fig. 5 is a schematic structural diagram of an electronic device provided in a third embodiment of the present invention, and shows a block diagram of an exemplary electronic device suitable for implementing an embodiment of the present invention. The electronic device shown in fig. 5 is only an example and should not be construed as limiting the functionality and scope of use of the embodiments of the present invention. As shown in fig. 5, the electronic device includes a processor 31, a memory 32, an input device 33, and an output device 34; the number of processors 31 in the electronic device may be one or more, in fig. 5, one processor 31 is taken as an example, and the processors 31, the memory 32, the input device 33 and the output device 34 in the electronic device may be connected by a bus or other means, in fig. 5, by bus connection is taken as an example.

The memory 32 is a computer readable storage medium that can be used to store software programs, computer executable programs, and modules, such as program instructions/modules corresponding to a parallel optimization method for a high performance computer in an embodiment of the present invention. The processor 31 executes various functional applications of the computer device and data processing by running software programs, instructions and modules stored in the memory 32, i.e. implements a parallel optimization method for a high performance computer as described above.

Note that the above is only a preferred embodiment of the present invention and the technical principle applied. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, while the invention has been described in connection with the above embodiments, the invention is not limited to the embodiments, but may be embodied in many other equivalent forms without departing from the spirit or scope of the invention, which is set forth in the following claims.

Claims

1. A parallel optimization method for a high performance computer, the method comprising:

2. The method of claim 1, wherein the method further comprises:

3. The method of claim 2, wherein generating, with the real-time storage space, a parallel storage space identifying available for parallel processing, the method further comprising:

4. The method of claim 3, wherein real-time storage space for all modules in each class of storage modules is identified, the method further comprising:

5. The method of claim 1, wherein the method further comprises:

6. The method of claim 5, wherein the method further comprises:

7. The method of claim 1, wherein N data preprocessing units are correspondingly established according to the data types of the N process sequence lists, the method further comprising:

wherein, the expression of the computational complexity is:

8. A parallel optimization system for a high performance computer, the system comprising:

the structure acquisition module is used for acquiring m storage modules according to a computer structure, wherein the m storage modules are connected with the control memory through a data bus;

the classification result acquisition module is used for acquiring the calculation characteristics of the m storage modules, dividing the m storage modules according to the calculation characteristics and acquiring the classification results of the storage modules;

the class matching module is used for receiving a plurality of computing processes, matching the class in the classification result of the storage module according to the data types of the computing processes, and outputting a plurality of classes of computing processes;

the sequence table construction module is used for correspondingly establishing a parallel process queue by using each type of computing process and the storage module of the matched type and outputting N process sequence tables;

the preprocessing module is used for correspondingly establishing N data preprocessing units according to the data types of the N process sequence tables, and respectively connecting the N data preprocessing units with each type of storage module;

the data processing module is used for acquiring a parallel computing process of any process sequence table, inputting the data of the parallel computing process into a corresponding data preprocessing unit for data preprocessing, and then reassigning the data to a corresponding class of storage module for data processing by an instruction issued by the control memory.

9. An electronic device, the electronic device comprising:

a memory for storing executable instructions;

a processor for implementing a parallel optimization method for a high performance computer according to any one of claims 1 to 7 when executing executable instructions stored in said memory.

10. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements a parallel optimization method for a high performance computer according to any of the claims 1-7.