CN102760085B - Communication track expanding method and device, communication track drive simulation method and system - Google Patents

Communication track expanding method and device, communication track drive simulation method and system Download PDF

Info

Publication number
CN102760085B
CN102760085B CN201110110818.5A CN201110110818A CN102760085B CN 102760085 B CN102760085 B CN 102760085B CN 201110110818 A CN201110110818 A CN 201110110818A CN 102760085 B CN102760085 B CN 102760085B
Authority
CN
China
Prior art keywords
communication
event
circulation
track
block
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201110110818.5A
Other languages
Chinese (zh)
Other versions
CN102760085A (en
Inventor
郝子宇
谢向辉
李宏亮
张昆
钱磊
吴东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuxi Jiangnan Computing Technology Institute
Original Assignee
Wuxi Jiangnan Computing Technology Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuxi Jiangnan Computing Technology Institute filed Critical Wuxi Jiangnan Computing Technology Institute
Priority to CN201110110818.5A priority Critical patent/CN102760085B/en
Publication of CN102760085A publication Critical patent/CN102760085A/en
Application granted granted Critical
Publication of CN102760085B publication Critical patent/CN102760085B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Data Exchanges In Wide-Area Networks (AREA)
  • Computer And Data Communications (AREA)

Abstract

The invention discloses a communication track expanding method and device. The communication track expanding method comprises the following steps of: extracting an original communication track file of each progress of an application program operating on one operation scale; matching attributes of atomic events in the original communication track file, determining a grammatical relation among the atomic events, organizing the atomic events according to the grammatical relation to form a grammatical event sequence; determining meanings and a relation of the atomic events in the grammatical event sequence, constructing an algorithm event according to a mutual relation of the grammatical events; operating the application on a plurality of different operation scales for multiple times to form a plurality of algorithm events; determining a relation among the algorithm events for reflecting relations between the communication track and the progress number and between the communication track and the inter-progress topology; and generating a target communication track file of the application operating on a target scale on the basis of the relation among the algorithm events. The embodiment of the invention also provides a communication track drive analogy method and system.

Description

Communication track extended method and device, communication track drive analogy method and system
Technical field
The track that the present invention relates to communicate by letter drives simulation field, and particularly a kind of communication track extended method and device, communication track drive analogy method and system.
Background technology
The large-scale application that high-performance computer is realized, can be used 10 5the processor (calculating core) of the order of magnitude.The high-performance computer architecture simulation that drives (Trace-driven) based on large scale system level Parallel Simulation platform, communication track, matter of utmost importance is to obtain the communication track of large-scale application (Trace is called again record).But high-performance computer resource-constrained, is difficult to obtain its right to use; At the target machine grinding, may apply obtaining communication track by directly moving for those.
Programming model plays the effect of forming a connecting link between computing machine and application--upwards, for application provides the mode that uses bottom high-performance computer, provide the behavioural characteristic of application downwards for computing machine.Programming model uses the user-level communication primitive (communicating by letter abstract) of machine system to realize, and the latter can directly be supported by hardware, also can pass through the user software support of operating system or particular machine.Programming model or be embedded in parallel programming language, or be implemented in programmed environment, the mapping of general language construction to special hardware can be completed.Utilize programming model, application is mapped to high-performance computer, need to follow certain design of Parallel Algorithms method.Most popular at high-performance computing sector is program message passing model, and the program message passing standard of message passing interface (MPI, Message Passing Interface) on having come true.
The parallel algorithm of design application, generally follows following steps: (1) is divided: whole calculating is resolved into multiple less tasks, its objective is and find more concurrency; (2) communication: determine the data of required exchange in all tasks carryings, and coordinate the execution of all tasks, detect thus the rationality of dividing; (3) combination: the cost of pressing performance requirement and realization is investigated the result in the first two stage, if desired can be by task synthetic larger little task groups to improve performance or to reduce communication overhead; (4) mapping: each task is assigned on processor, and object is to minimize overall situation execution time and communications cost, and maximizes processor utilization.Wherein mainly consider as machine-independent characteristics such as maximum concurrencies in early stage, just can take into account machine dependent characteristic in the later stage.What the first and second stages of design were paid close attention to is concurrency and extensibility, and seeks to develop the algorithm with these characteristics; In the third and fourth stage of design, just diversion on the locality problem relevant with other performance.As can be seen here, parallel Programming need to be considered four factors such as the operational process of the different and each process of application scale, machine scale, process, their mutually orthogonal common determination procedure structure complexity space.Obviously; the communication track that application program operation produces can be also along with problem scale, machine scale change and change; single run each enter the communication track difference of (line) journey, and the communication track that singly enters (line) journey has certain Changing Pattern.
MPI is a kind of specification of program message passing model, is embodied as the communication pool of language-specific.MPI is powerful, and portability and communication performance are good, and easily understand and learn, and have become program message passing model de facto standards.
MPI program belongs to single program multiple data parallel computation (SPMD, Single Program Multiple Data) type, and all processes are carried out identical code, and each process realizes control and computation process according to process sum and originating process number.Because interprocess communication expense is relatively large, therefore design of Parallel Algorithms always reduces interprocess communication number of times and communication data length as far as possible, accomplishes to calculate localization, but due to data and control dependence, between different processes, always produce communication, produce different communication track contents.
In order to analyze MPI interapplication communications rule and performance, the various communication functions of calling in can logging program operational process.Current, the communication behavior of existing numerous items application programs is studied.
ScalaTrace utilizes the feature of stencil coding, has realized the compression expanded and the playback of MPI communication track.ScalaTrace thinks that the one process that represents partial logic in Stencil coding is always from particular state, and the mutual and new state more with its neighbour, until system reaches stable.ScalaTrace performing step is as follows: (1) extracts the communication functions track of the each process of MPI program; (2) the repetition rule of searching one process intercommunication track, with RSDs (Regular Section Descriptors) and PRSDs (Power-RSDs) description; And process is carried out to the dependence of non-position and processes--afterwards, the communication track of some process is identical.(3) follow each process communication track after treatment, delete the communication track repeating, reduce communication track amount.(4) finally obtain the communication track after Lossless Compression.But because ScalaTrace only analyzes the Changing Pattern of local communication track, and the rule that communication track repeats is relevant with process number, therefore ScalaTrace can not adapt to the situation of process number variation.
Another research is synergisticing performance framework, its analytical procedure comprises: (1) communication trace logic: record all message information in program operation process, comprise message and swap byte quantity, determine main interacting message process pair, convert communication matrix to.Then analyzing communication matrix, determines the communication topology of application level.(2) communication trace compression: confirm the repeat pattern in MPI message communicating, find circulation.And realize based on Crochemore algorithm the new method that loop structure is found, realize and having optimized and greedy two kinds of algorithms.(3) performance framework builds: logic, compressed communication track are converted to executable program, make it to reproduce the behavior of communication track representative.Communication track converts executable C code to and follows following steps: i) in communication track, loop nesting is converted to program loop, and iterations stipulations become to conform to the target framework execution time; Ii) collective communication of communication in track and point to point link convert MPI to and communicate by letter and call, and artificial generated data is operated.Point-to-point calls the overall stencil communication pattern of generation and applied topology coupling; Iii) calculating section substitutes with the synthetic Accounting Legend Code of equivalent.But, owing to only logic being carried out to communication track in the relative position of one process; After the process of participation number changes, the relative position of certain process can change, and therefore their logical place also can change, and is not suitable for the situation that process number changes, the only changeless prediction of applicable process number.
In the statistics extensibility research of communication, analytic process is divided into two steps: the first step, carry out test of many times by changing task quantity or problem scale, and record the information such as calling station and time of all traffic operations.Second step, comprehensively analyzes all communication tracks, obtains the temporal information of each calling station; Utilize these information, calculate each calling station holding time and account for the ratio of T.T.; Then, task quantity and ratio are mapped, the contact between calculation task quantity and ratio.This research Main Analysis affects the factor of program execution performance, is a kind of result of statistics, lacks the explanation of the detailed communication pattern of application programs.
Correlation technique also can application reference number be 200910093067.3 Chinese patent application, this Patent Application Publication a kind of extracting method and system of parallel program communication mode, the method and system can reduce resource requirement and the time overhead of collecting large-scale parallel program communication pattern, realize the target of collecting large-scale parallel program communication pattern in minisystem.
Summary of the invention
The problem that the present invention solves is that prior art is difficult to by directly moving to obtain the communication track of large-scale application on extensive main frame, and in the situation that not needing analytical applications source code, obtains the communication rule of application.
For addressing the above problem, the invention provides a kind of communication track extended method, comprising:
Based on the original communication trail file that runs application in multiple different operation scales and generate, form respectively polyalgorithm event, the process participating in multiple different operation scales is counted difference;
Determine the relation between described polyalgorithm event, the relation between described polyalgorithm event embodies between communication track and process number and the relation between topology between track and process of communicating by letter;
Relation based between described polyalgorithm event generates the destinations traffic trail file that described application program is moved in target scale, and described operation scale is less than described target scale; Wherein,
Based on the original communication trail file that runs application in each operation scale and generate, formation algorithm event comprises:
Be extracted in the original communication trail file of each process of the application program of moving in this operation scale;
Mate the attribute of described original communication trail file Atom event, determine the grammatical relation between each atomic event, according to described grammatical relation, each atomic event tissue is formed to grammer sequence of events;
Implication and the mutual relationship of determining each grammer event in described grammer sequence of events, be configured to algorithm event according to the mutual relationship of each grammer event.
Optionally, described communication track extended method is also included in the attribute of the described original communication trail file Atom event of coupling and before the atomic event in described original communication trail file is analyzed, extract multiple attributes of atomic event, described attribute comprises filename and the line number that title, parameter value, the described function of message passing interface function occur.
Optionally, mate the first communication trail file that generates each process after the attribute of described original communication trail file Atom event, grammatical relation between described definite each atomic event comprises the communication pattern between the each atomic event of identification, and the communication pattern concrete steps between the each atomic event of described identification are as follows:
From the described first communication trail file of each process, take out a communication track, if be all collective communication track, record all collective communication tracks;
If be all point-to-point communication track, carry out point-to-point communication grammatical analysis and complete searching of communication block;
In the time handling communication tracks all in the described first communication trail file of each process, finish the identification to the communication pattern between each atomic event.
Optionally, described point-to-point communication grammatical analysis comprises:
Take out the trace information comprising in described point-to-point communication track, described trace information comprises type of message, originating process number, target process number, message identifier, communication domain;
Judge and in temporary trace information, whether have the trace information of pairing with it, if existed, record pairing situation, if there is no keep in the trace information of described point-to-point communication track.
Optionally, described in, completing searching of communication block comprises:
Whether search the communication track of the process of pairing with it for every communication track in temporary, check can form to send to receive pairing, be to record described pairing situation, otherwise the trace information that temporary this communication track comprises;
For incomplete communication block is searched the communication track of corresponding process, and carry out described point-to-point communication grammatical analysis;
Judging that all incomplete communication block receive pairing with the new transmission forming and whether formed communication block, is to record described communication block;
Judging whether also to have incomplete communication block, is to continue communication block to search, otherwise finishes searching communication block.
Optionally, the described transmission that judges all incomplete communication block and new formation receives to match whether formed communication block, is to record described communication block to comprise:
The new transmission forming is received to pairing to be added in incomplete communication block;
In the time that all processes have all been participated in the communication of one or many, be judged as and formed a communication block;
Record the communication block of described formation;
Repeat above-mentioned steps, receive pairing until handle the transmission of all new formation.
Optionally, in described definite described grammer sequence of events, implication and the mutual relationship of each grammer event comprise the circulation law of finding using communication module as loop body, described in being contained in, described communication module completes in the second communication trail file of searching rear generation of communication block, described communication module comprises at least one communication block, described circulation law comprises the cycle index of circulation framework and loop body, and described discovery circulation law concrete steps are as follows:
Determine and in all loop bodies, there is minimum communication block quantity len; Wherein len is natural number;
Find out the circulation using len communication block as loop body;
Merge the described circulation of finding out, upgrade described second communication trail file;
Repeat successively above-mentioned steps, until not circulation in described second communication trail file generates the third communication trail file with described circulation law.
Optionally, described in, find out using len communication block and comprise as the circulation of loop body:
From described second communication trail file, find out the communication module of all len of comprising communication block, from the 1st communication module wherein, two more adjacent communication modules successively, until n communication module is when different from n+1 communication module, find a circulation using len communication block as loop body, the cycle index of described loop body is n;
Continue two more adjacent successively communication modules since n+1 communication module, until completeer all communication modules that comprises len communication block; Wherein, n is natural number and n > 1.
Optionally, described more adjacent two communication modules successively comprise:
The communication block of correspondence in two more adjacent communication modules successively, if when m communication block is different from m communication block in n+1 communication module in n communication module, be judged as n communication module different from n+1 communication module; If communication block corresponding in these two communication modules is all identical, be judged as n communication module identical with n+1 communication module; Wherein, m is natural number and m≤len.
Optionally, the relation between described definite described polyalgorithm event comprises that the logic that forms described polyalgorithm event represents, the logic of described formation polyalgorithm event represents to comprise the logical organization of formation circulation framework, and concrete steps are as follows:
Step S601, k described third communication trail file and k circulation framework thereof when moving k described application program and obtaining different process numbers and participate in, each circulation framework comprises at least one first circulation framework; Wherein k is natural number and k > 1;
Step S602 takes out respectively 1 first circulation framework with maximum communication number of times from each described third communication trail file, obtains k the first circulation framework of corresponding k circulation framework;
Step S603, relatively whether this k the first circulation framework be identical, if identical, merges the cycle index of corresponding the first circulation framework; If different,, taking the first circulation framework of maximum process number as basis, merge cycle index;
Step S604, repeated execution of steps S602 and step S603, compare described third communication trail file entreme and mean ratio the first circulation framework, until completeer the first all circulation framework;
Wherein, in execution step S602, in the time thering is the first circulation framework of maximum communication number of times and be greater than 1, take out preceding the first circulation framework that puts in order in described third communication track.
Optionally, each the first circulation framework comprises at least one second circulation framework, and described the first circulation framework taking maximum process number is basis, merges cycle index and comprises:
By in the first circulation framework of described maximum process number, have in the second circulation framework of maximum communication number of times and other the first circulation frameworks, there is maximum communication number of times the second circulation framework relatively and merge cycle index;
Repeat above-mentioned steps, the first circulation framework entreme and mean ratio of described maximum process number the second circulation framework is compared and merges cycle index, until all the second circulation framework in the first circulation framework of completeer described maximum process number.
Optionally, the logic of described formation polyalgorithm event represents also to comprise the logical organization that forms communication block, the logical organization of described communication block is the first logical organization, described the first logical organization comprises: in every Serial Communication of described communication block, number of communications all equates, and all processes are all participated in communication, and in every row, adjacent processes difference equates; With expression formula P0F[P1, P2, P3] F_ represents described the first logical organization, wherein,
P0 represents first process number of this communication block;
P1 represents that, in a Serial Communication, whether head and tail process number is identical;
P2 represents in a Serial Communication, the difference of adjacent processes number;
P3 represents in a Serial Communication, the not quantity of process repeats;
When F represents that different rows is relatively, the number of corresponding process poor, F=(Fi) ... (F2) (F1);
F_ represents that [P1, P2, P3] needs the number of times repeating, F=*N1*N2...*Ni;
I is natural number, represents the number of times that the expression formula of described the first logical organization need to be launched.
Optionally, the logic of described formation polyalgorithm event represents also to comprise the logical organization that forms communication block, the logical organization of described communication block is the second logical organization, described the second logical organization comprises: in every Serial Communication of described communication block, number of communications all equates, and all processes are all participated in communication, and process number in every row according to etc. difference arrange; Described the second logical organization is with expression formula <H0|S>{H1, and H2}G represents, wherein,
H0 represents process number minimum in a Serial Communication;
Whether H1 represents that first row and last are listed as the process number of communicating by letter identical;
H2 represents according to the difference between the adjacent processes rearranging from big to small number;
S represents the position at minimum process number place;
G represents { H1, the number of times that H2} need to repeat, G=*M1*M2...*Mj;
J is natural number, represents the number of times that the expression formula of described the second logical organization need to be launched.
For addressing the above problem, the present invention also provides a kind of communication track to drive analogy method, comprising:
Utilize above-mentioned communication track extended method to generate destinations traffic trail file;
To in the described destinations traffic trail file input target machine architecture simulation device generating, carry out dry run.
For addressing the above problem, the present invention also provides a kind of communication track expanding unit, comprising:
Algorithm event forming unit, for the original communication trail file based on running application in an operation scale and generating, formation algorithm event;
Control module moves described application program for controlling described algorithm event forming unit, to form polyalgorithm event in multiple different operation scales; The process participating in multiple different operation scales is counted difference;
Logic Generation Unit, for determining the relation between described polyalgorithm event, the relation between described polyalgorithm event embodies between communication track and process number and the relation between topology between track and process of communicating by letter;
Destinations traffic trail file generation unit, generates for the relation based between described polyalgorithm event the destinations traffic trail file that described application program is moved in target scale; Described operation scale is less than described target scale;
Described algorithm event forming unit comprises:
Extraction unit, for extracting the original communication trail file of each process of the application program of moving in an operation scale;
Parsing unit, for mating the attribute of described original communication trail file Atom event, determines the grammatical relation between each atomic event, according to described grammatical relation, each atomic event tissue is formed to grammer sequence of events;
Algorithm Analysis unit, for determining implication and the mutual relationship of the each grammer event of described grammer sequence of events, is configured to algorithm event according to the mutual relationship of each grammer event.
Optionally, described algorithm event forming unit also comprises the event analysis unit being connected with described extraction unit and described parsing unit respectively, for before the attribute in the described original communication trail file Atom event of coupling, the atomic event of described original communication trail file being analyzed, extract multiple attributes of atomic event; Described attribute comprises filename and the line number that title, parameter value, the described function of message passing interface function occur.
Optionally, described parsing unit mates the first communication trail file that generates each process after the attribute of described original communication trail file Atom event, described parsing unit determines that the grammatical relation between each atomic event comprises the communication pattern between the each atomic event of identification, and the communication pattern between the each atomic event of described identification comprises:
From the described first communication trail file of each process, take out a communication track, if be all collective communication track, record all collective communication tracks;
If be all point-to-point communication track, carry out point-to-point communication grammatical analysis and complete searching of communication block;
In the time handling communication tracks all in the described first communication trail file of each process, finish the identification to the communication pattern between each atomic event.
Optionally, described parsing unit carries out point-to-point communication grammatical analysis and comprises:
Take out the trace information comprising in described point-to-point communication track, described trace information comprises type of message, originating process number, target process number, message identifier, communication domain;
Judge and in temporary trace information, whether have the trace information of pairing with it, if existed, record pairing situation, if there is no keep in the trace information of described point-to-point communication track.
Optionally, described parsing unit completes searching of communication block and comprises:
Whether search the communication track of the process of pairing with it for every communication track in temporary, check can form to send to receive pairing, be to record described pairing situation, otherwise the trace information that temporary this communication track comprises;
For incomplete communication block is searched the communication track of corresponding process, and carry out described point-to-point communication grammatical analysis;
Judging that all incomplete communication block receive pairing with the new transmission forming and whether formed communication block, is to record described communication block;
Judging whether also to have incomplete communication block, is to continue communication block to search, otherwise finishes searching communication block.
Optionally, described parsing unit judges that all incomplete communication block receive pairing with the new transmission forming and whether formed communication block, is to record described communication block to comprise:
The new transmission forming is received to pairing to be added in incomplete communication block;
In the time that all processes have all been participated in the communication of one or many, be judged as and formed a communication block;
Record the communication block of described formation.
Optionally, described Algorithm Analysis unit determines that the implication of each grammer event in described grammer sequence of events and mutual relationship comprise the circulation law of finding using communication module as loop body, described communication module is contained in described parsing unit and completes in the second communication trail file of searching rear generation of communication block, described communication module comprises at least one communication block, described circulation law comprises the cycle index of circulation framework and loop body, and described Algorithm Analysis unit finds that circulation law comprises:
Determine and in all loop bodies, there is minimum communication block quantity len; Wherein len is natural number;
Find out the circulation using len communication block as loop body;
Merge the described circulation of finding out, upgrade described second communication trail file;
Repeat aforesaid operations, until not circulation in described second communication trail file generates the third communication trail file with described circulation law.
Optionally, described Algorithm Analysis unit is found out using len communication block and is comprised as the circulation of loop body:
From described second communication trail file, find out the communication module of all len of comprising communication block, from the 1st communication module wherein, two more adjacent communication modules successively, until n communication module is when different from n+1 communication module, find a circulation using len communication block as loop body, the cycle index of described loop body is n;
Continue two more adjacent successively communication modules since n+1 communication module, until completeer all communication modules that comprises len communication block; Wherein, n is natural number and n > 1.
Optionally, described Logic Generation Unit successively more adjacent two communication modules comprise:
The communication block of correspondence in two more adjacent communication modules successively, if when m communication block is different from m communication block in n+1 communication module in n communication module, be judged as n communication module different from n+1 communication module; If communication block corresponding in these two communication modules is all identical, be judged as n communication module identical with n+1 communication module; Wherein, m is natural number and m≤len.
Optionally, described Logic Generation Unit determines that the relation between described polyalgorithm event comprises that the logic that forms described polyalgorithm event represents, the logic of described formation polyalgorithm event represents to comprise the logical organization that forms circulation framework, and the logical organization that described Logic Generation Unit forms circulation framework comprises:
K described third communication trail file and k circulation framework thereof when moving k described application program and obtaining different process numbers and participate in, each circulation framework comprises at least one first circulation framework; Wherein k is natural number and k > 1;
From each described third communication trail file, take out respectively 1 first circulation framework with maximum communication number of times, obtain k the first circulation framework of corresponding k circulation framework;
Relatively whether this k the first circulation framework be identical, if identical, merges the cycle index of corresponding the first circulation framework; If different,, taking the first circulation framework of maximum process number as basis, merge cycle index;
Repeat from each described third communication trail file, to take out 1 first circulation framework not comparing with maximum communication number of times and compare and the operation merging, until completeer the first all circulation framework;
Wherein, described Logic Generation Unit takes out respectively in the operation of 1 first circulation framework with maximum communication number of times from each described third communication trail file, in the time thering is the first circulation framework of maximum communication number of times and be greater than 1, take out preceding the first circulation framework that puts in order in described third communication track.
Optionally, each the first circulation framework comprises at least one second circulation framework, and described Logic Generation Unit, taking the first circulation framework of maximum process number as basis, merges cycle index and comprises:
By in the first circulation framework of described maximum process number, have in the second circulation framework of maximum communication number of times and other the first circulation frameworks, there is maximum communication number of times the second circulation framework relatively and merge cycle index;
The first circulation framework entreme and mean ratio of described maximum process number the second circulation framework is compared and merges cycle index, until all the second circulation framework in the first circulation framework of completeer described maximum process number.
For addressing the above problem, the present invention also provides a kind of communication track to drive simulation system, comprising: target machine architecture simulation device and above-mentioned communication track expanding unit,
Described communication track expanding unit, generates for expanding the destinations traffic trail file that described application program is moved in described target scale, and inputs to described target machine architecture simulation device;
Described target machine architecture simulation device, for carrying out dry run under the driving at described destinations traffic trail file, draws analog result.
Compared with prior art, the technical program has the following advantages:
In multiple different operation scales, move by extracting application program the original communication trail file obtaining, adopt communication track global rule analytical approach, the trace logic of communicating by letter, correspondence between localization, the loop structure of positioning and communicating track, the rule that analyzing communication track is relevant with operation scale, thus expansion obtains the communication track of extensive operational process, to support the Computer Architecture simulation of target scale.Described communication track extended method need to not applied on real extensive host in operation, just can fast, directly generate the communication track of the target scale operation of application; Do not need analytical applications source code can obtain the communication rule of application simultaneously, effectively reduce the difficulty of analytical applications original program.
In communication track, the logical organization of communication block represents mode, is convenient to communicate by letter statement and the expansion of track, thereby can more easily embody between communication track and process number and the relation between topology between track and process of communicating by letter.
Simulate by the destinations traffic trail file input target machine architecture simulation device that described communication track extended method is generated, can improve the simulation precision of simulator, reduce simulated cost, and then save the development time, reduce cost of development.
Brief description of the drawings
Fig. 1 is the communication trajectory analysis process schematic diagram of the first local rear overall situation;
Fig. 2 is global communication trajectory analysis process schematic diagram;
Fig. 3 is the schematic flow sheet of the communication track extended method that provides of embodiment of the present invention;
Fig. 4 is the schematic diagram of the weak symbol characteristic of compiler;
Fig. 5 is the schematic flow sheet of the communication pattern between the each atomic event of identification;
Fig. 6 is point-to-point communication track grammatical analysis schematic flow sheet;
Fig. 7 has been the schematic flow sheet of searching of communication block;
Fig. 8 is the schematic flow sheet that judges that communication block forms;
Fig. 9 is the schematic flow sheet of finding circulation law;
Figure 10 is the schematic flow sheet that forms the logical organization of circulation framework;
Figure 11 is communication block structural representation;
Figure 12 is communication pattern expansion and actual total number of communications ratio schematic diagram of NPB test procedure logic;
Figure 13 is the communication track expanding unit structural representation that embodiment of the present invention provides.
Embodiment
Prior art is difficult to by directly moving to obtain the communication track of large-scale application on extensive main frame, and the communication rule that obtains application in the situation that not analyzing application source code also can only be analyzed the Changing Pattern of local communication track, and be not suitable for the situation that process number changes.The technical program is by extracting the application program original communication trail file that repeatedly operation obtains in difference operation scale, adopt communication track global rule analytical approach, by communication trace logic, correspondence between localization, the loop structure of positioning and communicating track, the rule that analyzing communication track is relevant with operation scale, thus expansion obtains the communication track of target scale operational process.The technical program has realized without operation application on real extensive host, just can fast, directly generate the extensive communication track of application; Do not need analytical applications source code can obtain the communication rule of application simultaneously, effectively reduce the difficulty of analytical applications original program.
For above-mentioned purpose of the present invention, feature and advantage can more be become apparent, below in conjunction with accompanying drawing, the specific embodiment of the present invention is described in detail.
Set forth detail in the following description so that fully understand the present invention.But the present invention can be different from alternate manner described here and implements with multiple, and those skilled in the art can do similar popularization without prejudice to intension of the present invention in the situation that.Therefore the present invention is not subject to the restriction of following public embodiment.
MPI communication functions can be divided into two large classes: collective communication and point-to-point communication.Generally, in the time there is collective communication, all processes all can participate in, and each process can be called identical collective communication function--from this angle, between process, there is no difference.The specific algorithm of communication functions is the inside realization of MPI communication pool, irrelevant with application.
Collective communication expense is large, and for concrete application, individual process does not need all data, only need obtain the data of oneself or local data is sent to appointment process--and use point-to-point communication to realize.Point-to-point communication can directly reflect the service condition of application to bottom high-performance computer computing node, transmit the process of data, and communication pattern is along with problem scale, machine scale change and change, and this is also the part being most interested in herein.From part, process is tended to and is closed on process communication; From the overall situation, local communication can be combined into the world model with a fixed structure; The life cycle of observing individual process, the repetitive cycling of same structure is process operational process main body; Therefore, between process, always there is certain logical organization, can disclose application and how use bottom hardware.
Different application has different characteristic, corresponding different parallel algorithms, and process may be organized into one dimension, two dimension, three-dimensional or other topological structures, but point-to-point communication generally occurs between concrete topological structure " neighbour ".On the other hand, along with application scale, or/and machine scale changes, " neighbour " with identical numbering process also can change, and certain rule is followed in this variation.
From MPI program source code, point-to-point communication uses also the most direct--use source or the object process of corresponding MPI function specify message.For example, MPI program false code below:
irecv((my_id+num-1)%num);
send((my_id+num+1)%num);
wait(irecv);
In the time having 4 processes, No. 0 process receives the message of No. 3 processes; In the time having 8 processes, No. 0 process receives the message of No. 7 processes; ... therefore can be derived from, No. 0 process always receives the message from last numbering process--the rule of Here it is No. 0 process receipt message.
Therefore, inventor wishes to find by analyzing the communication track of MPI program the rule of the variation of point-to-point communication between process.
The present embodiment is using the MPI program of large-scale application as research object; utilize MPI program obtaining communication track on small scale machine; study its communication pattern; by abstract and logic; find the Changing Pattern of communication track, realize the extrapolation of communication track, extrapolation process is simple; participate in without multi-process, and extrapolation result meets original communication track substantially.
Because MPI program has SPMD feature, therefore original program is logically complete--know and how to control each participation process, indicate each process to complete any partial arithmetic in entirety, receive whose message, send to whom message etc.
In MPI program operation process, each process is the communication track of record oneself separately, each process produces different communication track mutually, causes original complete communication pattern to split into different logic segments, and each independent communication track is merely able to see part of the whole.For example following MPI program false code:
If the communication track of the different processes of independent analysis, each process is only known the reception and transmission of local message--in the track of communicating by letter of even number process, only have transmission (send), in the communication track of odd number process, only have reception (recv), cannot determine own in the overall situation position, cannot catch on the whole complete communication rule.
Complete communication pattern is expressed in source code, but has been split by actual motion.If the communication track of the different processes of independent analysis, possible some logic segment has identical structure, but still cannot catch as a whole complete communication rule.
Therefore, embodiment of the present invention is by backward inference--and reconfigure logic segment, build gradually complete communication pattern, recover communication behavior in original concurrent program.For example, the communication track that previous example produces, if observe independent communication track, can think that the communication track of even number process is identical, the communication track of odd number process is identical, and the most latter incorporated communication track comprises two kinds simultaneously; And if from global observing, match one by one different processes transmission receive pairing (send-recv), will find to be actually simple information receiving and transmitting between even number process and odd number process--this logic statement is not only suitable for and current ruuning situation, and can be applicable to the situation of any process operation.
Generally, communication pattern comprises many levels:
I) basic communication unit: for example send-recv, unblock function-wait function (Wait);
Ii) serial communication: when send-recv matches when successive, multiple pairings can form a communication string, communication string may only comprise all processes a part--all processes can form multiple serial communications.
Iii) monolithic communication: the communication form that all processes are all participated in is called monolithic communication, may be divided into multiple groups, and every group forms serial communication, between serial communication, may not have contact.
Iv) polylith communication: due to the needs of algorithm, process group may be made into multi-formly in computation process, there is certain Changing Pattern between different tissues.
From the communication track disperseing, obtain complete logical laws, may have two kinds of analytic processes.
First first method finds the rule of each process, more comprehensive all processes find overall rule.Fig. 1 is the communication trajectory analysis process schematic diagram of the first local rear overall situation.As shown in Figure 1, first choose certain process (example is process 3 as shown in Figure 1), the rule of the communication track to this process is analyzed, and then all process comprehensive laws of single run are analyzed, analyze the rule of the corresponding loop body of all processes, finally analyze repeatedly moving global regularity between rear process again, draw the relation between communication rule and process number and the process number carrying out.Current many researchs have been used this analytic process.During due to beginning, only pay close attention to the local characteristics of communication track, focus on the local detail problem of communication track, therefore may obscure some overall rule, affect later step to overall analysis.
Therefore inventor considers to take another kind of method, and first comprehensive certain moves the communication track of all processes, then finds overall rule.Fig. 2 is global communication trajectory analysis process schematic diagram.As shown in Figure 2, first choose the rule of corresponding certain section of communication track of all processes of single run and analyze (producing single communication trail file), then the rule of single run overall communication track is analyzed to (finding circulation law), finally analyze repeatedly moving global regularity between rear process, draw the relation between communication rule and process number and the process number carrying out.This method wishes to find global communication rule from the beginning, and the communication path segment of all processes is merged into single full communication track, then starts with from overall rule, progressively the communication logic of analytical applications.For example MPI program false code above-mentioned:
irecv((my_id+num-1)%num);
send((my_id+num+1)%num);
wait(irecv);
If considered from the overall situation, just can obtain at once overall communication pattern (four processes are participated in):
0→1→2→3→0
In the embodiment of the present invention, use second method, realize parallel extrapolation global rule (GROPE, GlobalRegulation On Parallelization Extrapolation), obtain global schema and circulation nested of communication, and make it to set up and contact with operation scale, excavate out as much as possible the various logic rule containing in communication track, thereby can understand intuitively communication and the calculated characteristics of application, realize communication track prediction.
NPB (NAS Parallel Benchmark) concurrent testing program is senior supercomputing branch (the NASA Advanced Supercomputing Division) exploitation by NASA, for testing the procedure set of parallel supercomputer performance.As a standard index of supercomputer performance, it is widely accepted in science calculating field.Whole procedure set is made up of 8 programs, comprises 5 kernel programs and 3 simulation application, and these programs all come from large-scale calculations fluid mechanics (CFD) application program.5 kernel programs have been simulated the calculating core of 5 kinds of numerical methods in CFD, and 3 simulation application have been reappeared data mobile and the calculating in complete CFD program.The present embodiment selects BT program in NPB concurrent testing program, CG program, MG program, SP program as research object, has both had complicated communication pattern, also can realize fast prototype system.
Fig. 3 is the schematic flow sheet of the communication track extended method that provides of embodiment of the present invention.As shown in Figure 3, described communication track extended method comprises:
Step S101, extracts the original communication trail file of each process of the application program of moving in an operation scale;
Step S102, mates the attribute of described original communication trail file Atom event, determines the grammatical relation between each atomic event, according to described grammatical relation, each atomic event tissue is formed to grammer sequence of events;
Step S103, determines and implication and the mutual relationship of each grammer event in described grammer sequence of events is configured to algorithm event according to the mutual relationship of each grammer event;
Step S104, provides different operation scales, repeats above-mentioned steps, by the repeatedly operation in multiple different operation scales of described application program, forms polyalgorithm event; Wherein, in an operation scale, move an algorithm event of the corresponding formation of described application program;
Step S105, determines the relation between described polyalgorithm event, and the relation between described polyalgorithm event embodies between communication track and process number and the relation between topology between track and process of communicating by letter;
Step S106, the relation based between described polyalgorithm event generates the destinations traffic trail file that described application program is moved in target scale, and described operation scale is less than described target scale.
With embodiment, above-mentioned communication track extended method is elaborated below.
First, perform step S101, extract the original communication trail file of each process of the application program of moving in an operation scale.Particularly, application program is moved in the computer system of a less operation scale (the process number of participation is less), can be utilized weak symbol (weak symbol) compiling of compiler characteristic, the communication track while recording the operation of MPI program.Fig. 4 is the schematic diagram of the weak symbol characteristic of compiler.Consult Fig. 4, in application program operational process, while carrying out point-to-point communication between process, the communication of example transmission message (MPI Send) as shown in FIG., can, through the plug-in mounting in plug-in mounting storehouse, MPI Send be redefined, generate the interface that calls PMPI Send in MPI storehouse, therefore can record by the process of above-mentioned plug-in mounting the track communicating, thereby realize the original communication trail file of the each process that is extracted in the application program of moving in operation scale.Step S101 realizes with conventional means in prior art, does not repeat them here.
In general, communication track can, immediately as the input of simulator, obtain analog result after extraction.But, directly use original communication track to limit the usable range of track, can only complete the simulation with the same or similar target machine of host structure.In order to expand the usable range of communication track, can communication trail file be analyzed and be expanded, input as simulator using new communication track.Therefore, in the present embodiment, the communication trail file extracting from the operation of small-scale computing machine is called to original communication trail file, the communication trail file of operation on the large-scale computer expanding after described original communication trail file is analyzed (need to not move on real extensive host) is called to destinations traffic trail file.
In described original communication trail file, may comprise the time interval and MPI function information such as residing position in source code between the detail parameters, execution time, MPI function of MPI function in program operation process, there is following structure: filename. line number. the time interval of this function and a upper function. function name (parameter value). this function timing.
In the present embodiment, extracting after described original communication trail file, can also further analyze the atomic event in described original communication trail file, extract multiple attributes of atomic event, described attribute comprises the title of MPI function and filename and the line number that parameter value, described function occur, can also comprise the time interval, this function timing etc. of certain function and a upper function.Atomic event be in particular studies, extract from communication track that interface directly obtains, has basic meaning can not subdivisional organization.Atomic event can carry attribute, comprises the environment attribute that property value that event is intrinsic and event occur, and these attributes are used in the follow-up grammatical analysis of the embodiment of the present invention and Algorithm Analysis afterwards.For example, in MPI function, send the MPI Send function of message for point-to-point communication, in original communication trail file, be with MPI Send (buf, count, datatype, dest, tag, comm) form occurs, MPI Send function is exactly an atomic event, is multiple attributes of atomic event (represent to be respectively sent out message addresses, be sent out type, target process number, message identifier, the communication domain of data item number, message data) in bracket.Equally, be also an atomic event for the MPI Recv function of point-to-point communication receipt message.
Event analysis refers to according to research needs, extracts the multiple attribute of each atomic event in original communication trail file, and such as name, time, place, correlated variables etc., form the definite event sequence that normalization is explained.Generally, original communication track exists with character string forms, and therefore event analysis process is according to certain rule parsing character string and extracts information needed.These rules have widely definition, for example: the 1) coupling of name: the character string of certain length, has represented the implication of atomic event type; 2) parameter of the value of the value of correlated variables, particularly function parameter, instruction etc.Atomic event may form following form:
e i &phi; = name ( attr _ 1 = value _ 1 , attr _ 2 = value _ 2 , . . . , attr _ m = value _ m ) .
If communication track only has single value, for example memory access address communication track, so:
e i &phi; = value .
Sequence of events in the mode of online or off-line original communication track or after analyzing, as simulator input, drives dry run.
In other embodiments, in the time that all objects (attribute of atomic event) that need to study have been contained in original communication trail file, do not need to carry out the step of event analysis (extracting the attribute of atomic event).
Then, execution step S102, mates the attribute of described original communication trail file Atom event, determines the grammatical relation between each atomic event, according to described grammatical relation, each atomic event tissue is formed to grammer sequence of events.Step S102 is the process to each atomic event carries out grammatical analysis in described original communication trail file, described grammatical analysis refers to according to the cause-effect relationship between atomic event, by the coupling to the every attribute of atomic event, determine the grammatical relation between each atomic event, form atomic event association the most basic, that have clear and definite implication, described atomic event association is called grammer event here, and multiple grammer events form grammer sequences of events.Communication track extracts interface and has implication clearly, such as: in memory access address communication track, each event has indicated certain address value of wanting access memory; The relation between the meaning of inclusion function and different function completely of the communication track extracting from application programming interface (API, Application Programming Interface) level; The communication track taking out from instruction set architecture (ISA, Instruction Set Architecture) level can give expression to the function of machine instruction set architecture.The explanation to a certain level of system interface that communication reflects in track, is called the grammatical relation (grammatical relation of each atomic event in the trail file of communicating by letter) of the track of communicate by letter.
In extraction communication track process, may destroy original grammer, for example, for MPI function, to be original program carry out at MPI interface the communication track of each process that a part--the sending and receiving function of point-to-point communication there will be in the communication track in different processes.Therefore, the object of grammatical analysis is to reappear communication track to extract the original explanation of interface to event, adopts interface syntax analysis atomic event sequence.The described interface syntax are made up of atomic event, grammer event, one group of production and building method.Described grammer event refers to the atomic event reorganizing by grammatical relation, and each grammer event may only comprise an atomic event, or multiple atomic event.Grammer event also have attribute, attribute possess value, represent character and the running environment etc. thereof of grammer event itself.Grammer event is directly made up of atomic event, and its composition rule is one group of production.The production of the interface syntax has been described atomic event and has been combined into the method for grammer event.Production comprises lower column element:
I) be called as the grammer event of left part
Ii) symbol: :=
Iii) be called as the combination of atomic event He "+" symbol of right part, described the formation of production, atomic event represents the logical order between them by the sequencing of "+" composition.
Described building method refers to that the production of the application interface syntax is configured to atomic event the process of grammer event, it comprises: the atomic event in order input communication track successively, the all production that find right part to comprise this atomic event, mate next atomic event according to the implication of each production again, now can filter out a part of production, remain a part of production, then mate next production according to the implication of production, so until a production of residue, if right part also has atomic event not mate, then input production, until all atomic event couplings of this production right part are complete.It should be noted that, grammer event also has atomicity, and it is indivisible shows that grammer event can not appear at the right part of production.
Event after analyzing according to the identification of MPI programming specification, match event, occurrence comprises: function name, parameter values etc.Grammatical analysis primitive rule for MPI program Atom event is:
(1) for collective communication, all can there is identical function in all processes in communication domain (if there is no special operational, communication comprises all processes), and its parameter is also identical.
(2), for point-to-point communication, the transmission function of message and receiver function must mate message source/destination, message identifier and three parameters of communication domain; Unblock function must have wait function to wait for that it completes.
Complete after basic MPI function coupling, need to further analyze these unit and how form the more communication rule of large model, apply the structure of algorithm.
In the present embodiment, generate and have the first communication trail file of each process (for communication trail file of the corresponding generation of each process by mating in step S102 after the attribute of described original communication trail file Atom event, here be referred to as described the first communication trail file), grammatical relation between described definite each atomic event comprises the communication pattern between the each atomic event of identification, Fig. 5 is the process flow diagram of the communication pattern between the each atomic event of identification, consult Fig. 5, the communication pattern concrete steps between the each atomic event of described identification are as follows:
Step S201 takes out a communication track from the described first communication trail file of each process;
Step S202, judgement communication type of gesture, judge whether to be all collective communication track or to be all that point-to-point communication track is (because identical function all can appear in all processes in the communication domain of collective communication, its parameter is also identical, therefore take out communication track otherwise be all collective communication track, or be all point-to-point communication track);
If be all collective communication track, perform step S203, record all collective communication tracks, and go to step S201;
If be all point-to-point communication track, perform step S204, carry out point-to-point communication grammatical analysis and complete searching of communication block;
Step S205, judges whether to handle all communication tracks in the described first communication trail file of each process,
Otherwise forward step S201 to;
The identification finishing the communication pattern between each atomic event.
Fig. 6 is point-to-point communication track grammatical analysis schematic flow sheet.Consult Fig. 6, the described point-to-point communication grammatical analysis in above-mentioned steps S204 specifically comprises:
Step S301, takes out the trace information comprising in described point-to-point communication track, and described trace information comprises type of message, originating process number, target process number, message identifier, communication domain;
Step S302, judges in temporary trace information (trace information taking out is for the first time directly temporary) whether have the trace information of pairing with it,
If existed, perform step S303, record pairing situation;
If there is no perform step S304, the trace information of temporary described point-to-point communication track.
It should be noted that, above-mentioned point-to-point correspondence is described grammatical relation, through point-to-point communication grammatical analysis, after the success of pairing trace information, form to send and receive pairing, a described transmission receives pairing and is a grammer event, and the sequence that multiple transmissions receive pairing formation is grammer sequence of events.For example, 0 process sends message to 1 process, and 1 process receives the message that 0 process sends, therefore " 0 → 1 " is a grammer event, in like manner, " 1 → 2 " is also a grammer event, and " 0 → 1 → 2 → 3 " become a grammer sequence of events (comprising three grammer events).
Fig. 7 has been the schematic flow sheet of searching of communication block.Consult Fig. 7, complete searching specifically of communication block described in above-mentioned steps S204 also can comprise:
Step S401, for every communication track in temporary is searched the communication track of the process of pairing with it;
Whether step S402, check can form to send to receive pairing, is to perform step S403, records described pairing situation, goes to step S401; Otherwise execution step S404, the trace information that temporary this communication track comprises;
Step S405, for incomplete communication block is searched the communication track of corresponding process, and carries out described point-to-point communication grammatical analysis;
Step S406, judges that all incomplete communication block receive pairing with the new transmission forming and whether formed communication block, is to perform step S407, records described communication block, then forwards step S408 to; Otherwise directly perform step S408;
Step S408, judges whether also to have incomplete communication block, is to forward step S401 to, otherwise finishes searching communication block.
Fig. 8 is the schematic flow sheet that judges that communication block forms.Consult Fig. 8, above-mentioned steps S406 specifically also can comprise:
Step S406a, receives pairing the new transmission forming and adds in incomplete communication block;
Step S406b, judges whether that all processes have all participated in the communication of one or many,
Be to perform step S406c, be judged as and formed a communication block, and record the communication block of described formation;
Otherwise forward step S406a to.
Step S406d, judges whether that the transmission of handling all new generations receives pairing,
It is the judgement finishing communication block;
Otherwise forward step S406a to.
Use the method for the communication pattern between the each atomic event of above-mentioned identification to analyze NPB program, wherein the representative communication piece of main test procedure is as shown in table 1, wherein, the A in table 1 in the bracket of each program represents that calculating scale is A, the process number of the numeral operation in bracket.
Table 1:NPB test procedure representative communication piece
Form after described grammer sequence of events by step S102, perform step S103, determine implication and the mutual relationship of each grammer event in described grammer sequence of events, be configured to algorithm event according to the mutual relationship of each grammer event.Step S103 is the process that described grammer sequence of events is carried out to Algorithm Analysis, and described Algorithm Analysis refers to according to the implication of each grammer event and mutual relationship thereof, further forms the algorithm level applied logic fragment that meaning is abundanter, be closely connected with application.Similar with grammatical analysis, certain rule is also followed in Algorithm Analysis, i.e. the application syntax.The application syntax are made up of grammer event, algorithm event, one group of production and building method.Grammer event as previously described, has attribute, attribute and possesses value, represents character and the environment of operation etc. thereof of grammer event itself.If consider all participants and complete operational process, the meaning of grammer event representation has locality, need to further expand.Algorithm event refers to can describe grammer composition of matter that occur, that have relatively complete independent logical intension between interior multiple (being whole under normal circumstances) participant of a period of time, for example: circulation, loop nesting, branch etc.Apply grammatical production and described the method for grammer composition of matter preconceived plan method event.Production comprises lower column element:
I) be called as the algorithm event of left part
Ii) symbol: :=
Iii) be called as the grammer event of right part, the combination of algorithm event, described the formation of production.
Building method refers to the process that practices grammatical production grammer event is configured to algorithm event.Algorithm event has concept and complicated structure widely, and still the most basic is circulation and branch, and the structure relevant to application.Algorithm event can further form new algorithm event, and for example loop nesting is made up of polyalgorithm event exactly.
Circulation is important structure in program, operates different data by the available identical code that circulates, and produces the communication track of huge amount, therefore, determines that in communication track, various loop structures can dwindle communication track amount greatly; The more important thing is the general frame of determining program by loop structure, the track that makes to communicate by letter is no longer enumerating of fragmented information, but has very distinct logical architecture, for more deep analytical applications provides basis.
In the present embodiment, in described definite described grammer sequence of events, implication and the mutual relationship of each grammer event comprise the circulation law of finding using communication module as loop body, described in being contained in, described communication module completes in the second communication trail file of searching rear generation (finishing to generate after grammatical analysis) of communication block, described communication module comprises at least one communication block, and described circulation law comprises the cycle index of circulation framework and loop body.Fig. 9 is the schematic flow sheet of finding circulation law.Consult Fig. 9, described discovery circulation law concrete steps are as follows:
Step S501, determines and in all loop bodies, has minimum communication block quantity len; Wherein len is natural number;
Step S502, finds out the circulation using len communication block as loop body;
Step S503, merges the described circulation of finding out, and upgrades described second communication trail file;
Step S504, judges in described second communication trail file whether also have circulation, if it is forward step S501 to, otherwise execution step S505 generates the third communication trail file with described circulation law.Described third communication trail file is the described second communication trail file not circulated after repeatedly upgrading.So far, in described third communication trail file, there is clear and definite circulation framework and the cycle index of loop body.It should be noted that, because len needs to redefine after having upgraded described second communication trail file at every turn, therefore the value of len can change along with the renewal of described second communication trail file.
Wherein, step S502 specifically comprises:
Step S502a, from described second communication trail file, find out the communication module of all len of comprising communication block, from the 1st communication module wherein, two more adjacent communication modules successively, until n communication module is when different from n+1 communication module, find a circulation using len communication block as loop body, the cycle index of described loop body is n;
Step S502b, continues two more adjacent successively communication modules since n+1 communication module, until completeer all communication modules that comprises len communication block; Wherein, n is natural number and n > 1 (described circulation should be at least 2 times).
In the present embodiment, described more adjacent two communication modules successively specifically can comprise: corresponding communication block in two more adjacent communication modules successively, if when m communication block is different from m communication block in n+1 communication module in n communication module, be judged as n communication module different from n+1 communication module; If communication block corresponding in these two communication modules is all identical, be judged as n communication module identical with n+1 communication module; Wherein, m is natural number and m≤len.Particularly, in two more adjacent successively communication modules when corresponding communication block, if the 1st communication block is identical with the 1st communication block in n+1 communication module in n communication module, continue relatively in n communication module the 2nd communication block in the 2nd communication block and n+1 communication module ..., until when in n communication module, m communication block is different from m communication block in n+1 communication module, be judged as n communication module and n+1 communication module different (now finishing immediately the comparison to respective communication piece in n communication module and n+1 communication module), if find that after relatively communication block corresponding in these two communication modules is all identical, be judged as n communication module identical with n+1 communication module, wherein, m is natural number and m≤len.
According to the above-mentioned method of two more adjacent communication modules successively, continue two more adjacent communication modules (comparing n+1 communication module and n+2 communication module) since n+1 communication module, until completeer all communication modules that comprises len communication block.Thus, just can find out all circulations using len communication block as loop body.
In other embodiments, described more adjacent two communication modules successively also can adopt other mode to compare communication block corresponding in two communication modules, might not take the mode of " comparing successively ", for example, can also take the communication block in the middle of (communication block of correspondence two communication modules) is positioned to compare toward the communication block before and after being positioned at.
Wherein, described in step S503, merge the described circulation of finding out, upgrading described second communication trail file specifically refers to all circulations using len communication block as loop body of finding out in described second communication trail file, to replace original communication module repeating continuously (being cyclic part) with the representation of circulation framework, for example: if the A finding in original described second communication trail file, two communication modules of B all have 3 communication block (len=3), wherein A has circulated 2 times, B has circulated 3 times, can replace by the representation of circulation framework " [2] [3] " so the form of original " AABBB ".After repeated execution of steps S501 to S503, if find that " [2] [3] " circulated 2 times equally, can adopt " [2[2] [3]] " representation to replace the form of " AABBB AABBB ".Find in the process of described circulation law, can constantly upgrade described second communication trail file, until can not find circulation.
In the present embodiment, step S501 determines in all loop bodies to have minimum communication block quantity len, in other embodiments, also can first determine in all loop bodies and have maximum communication block quantity, then carries out follow-up corresponding step and finds circulation law.But in actual implementation process, determine that in all loop bodies, having minimum communication block quantity len has better implementation result.
The NPB test procedure nested loop structure that the methods analyst of the circulation law by above-mentioned discovery communication module goes out is as shown in table 2, wherein, the number of times of numeral circulation in table 2, the nest relation of square bracket represents the nest relation of circulation.
The loop nesting relation of table 2:NPB test procedure
BT(A,36) [5][5][5][5][5][5][200[5][5][5][5][5][5]]
CG(A,32) [25[2][2]][15[25[2][2]]]
SP(A,36) [5][5][5][5][5][5][400[5][5][5][5][5][5]]
MG(A,32) [2[2]][6[2]][2[2][2][2]][15[2]][2][2][6[2]][3[2[2][2][2]][21[2]]][2[2][2][2]][15[2]]
Form after described algorithm event by step S103, execution step S104, provides different operation scales, repeats above-mentioned steps, by the repeatedly operation in multiple different operation scales of described application program, forms polyalgorithm event.The process participating in described multiple different operation scales is counted difference, 4 different operation scales that for example provide respectively 4,8,16 or 32 processes to participate in.
Based on the original communication trail file that runs application in multiple different operation scales and generate, form respectively after polyalgorithm event, execution step S105, determine the relation between described polyalgorithm event, the relation between described polyalgorithm event embodies between communication track and process number and the relation between topology between track and process of communicating by letter.Step S105 is the process that logic generates, described logic generates and refers to the further vague generalization of algorithm event, abstract concrete running environment be transformable running environment, abstract the physical relationship of certain operation between participant for relative relation, form the intermediate representation of certain form, there is logical relation more clearly, make it to depart from communication track concrete environment and parameter while extracting, can form new environment and the track of communicating by letter under parameter by changing some attribute.In addition, also can in the leaching process of communication track, record logical relation in service, directly generate the logical expression of communication track.
By processing above, can obtain application program complete and there is the communication track of loop structure.In the time having different process numbers to participate in, the communication track that each application program obtains is mutually different, therefore can not rely on the single operation of application program to obtain the information that communication track is expanded--need to obtain the repeatedly communication track of operation of application program in the situation that different process numbers are participated in, then from wherein finding the rule of expansion.
In order to communicate the expansion of track, generally need point three steps: (1) repeatedly runs application, and obtains the communication track of each run; (2) become corresponding logic to represent concrete interapplication communications track content transforming--find communication track and participate in process number and communicate by letter the relation between topology (membership credentials between process) between track and process; (3) according to logic represent realize communication track expansion.
The logic of communication track represents to comprise the content of three aspects:: the rule that complete communication pattern, clear and definite loop structure and this two aspect change with operation scale (membership credentials of process number and process).Communication pattern and loop structure are bases, by the repeatedly operation of application program, obtain communication pattern and the loop structure of after each run, communicating by letter in track, and communication and circulation and operation scale are set up and contacted, definite rule wherein.It should be noted that, represent to reflect clearer, easily communication track and participate in process number and communicate by letter the relation between topology (membership credentials between process) between track and process by the logic of communication track, but (the comparatively simple situation of rule) in some cases, also can directly find rule wherein by the polyalgorithm event forming, represent and do not need to form logic.
In the present embodiment, the relation between described definite described polyalgorithm event comprises that the logic that forms described polyalgorithm event represents, the logic of the described polyalgorithm event of described formation represents to comprise the logical organization that forms described circulation framework.Figure 10 is the schematic flow sheet that forms the logical organization of circulation framework.Consult Figure 10, the logical organization concrete steps of described formation circulation framework are as follows:
Step S601, k described third communication trail file and k circulation framework thereof when moving k described application program and obtaining different process numbers and participate in, each circulation framework comprises at least one first circulation framework (if do not comprise the first circulation framework in circulation framework, the circulation being formed by a communication module so, can directly compare inapplicable subsequent step); Wherein k is natural number and k > 1.Particularly, 1 described application program of every operation generates 1 described third communication trail file, the process number that each run is participated in is all different, generally be the trend increasing progressively, move after k time, obtain altogether k described third communication trail file, in each third communication trail file, there is circulation law, there is the cycle index of clear and definite circulation framework (k described third communication trail file is respectively to there being k circulation framework) and loop body.
Step S602 takes out respectively 1 first circulation framework with maximum communication number of times from each described third communication trail file, obtains k the first circulation framework of corresponding k circulation framework.Described the first circulation framework refers to circulation framework nested in circulation framework, also can be called subcycle framework, certainly in circulation framework, in subcycle framework, also may comprise subcycle framework, for difference to some extent, ground floor subcycle framework is called to the first circulation framework, in like manner, in subsequent step, second layer subcycle framework is called to the second circulation framework.Can in each described third communication trail file, there is by calculating the first circulation framework (wherein may also comprise nested circulation framework) of maximum communication number of times.Some processes send a message to another process and count once communication, and for example: process 0 sends message to process 1, process 1 sends message to process 2, and process 2 sends message to process 3, and number of communications is 3 times altogether.Step S602 need to calculate the first circulation framework with maximum communication number of times.Here the circulation framework referring to comprises nested circulation framework, such as: the number of communications of communication module A is 3 times, communication module A has circulated 2 times, the number of communications of communication module B is 4 times, communication module B has circulated 3 times, and communication module A and the circulation communication module B of 3 times of circulation 2 times have circulated again 2 times altogether, the number of communications that now should calculate the first circulation framework of the communication module B of the communication module A of nested circulation 2 times and circulation 3 times, amounts to (3*2+4*3) * 2=36 time.If also have in addition the communication module C that circulated 4 times, and the number of communications of communication module C is 10 times, the number of communications of this first circulation framework amounts to 40 times, suppose not exist other the first circulation framework, 4 times this first circulation framework of communication module C of having circulated is so the first circulation framework with maximum communication number of times.
Step S603, relatively whether this k the first circulation framework be identical, if identical, performs step S603a, merges the cycle index of corresponding the first circulation framework; If different, perform step S603b, taking the first circulation framework of maximum process number as basis, merge cycle index, after completing, go to step S604.Described this k the first circulation framework is identical specifically also comprises two kinds of situations:
The first situation, this k the first circulation framework is identical, and wherein respective cycle number of times is also identical, and the cycle index of corresponding the first circulation framework of so described merging refers to gets the wherein logical organization of the first circulation framework of arbitrary operation;
The second situation, this k the first circulation framework is identical, the number of times difference of circulation, but can find the Changing Pattern of cycle index with the process number participating in, the cycle index of corresponding the first circulation framework of so described merging refers to summarizes the Changing Pattern of described cycle index with the process number of participating in, and obtains the logical organization of the first circulation framework that reflects described Changing Pattern.For a simple example, if in the time that 36 processes are participated in, the the first circulation framework obtaining is [2[2] [3]], in the time that 72 processes are participated in, the the first circulation framework obtaining is [4[4] [6]], in the time that 108 processes are participated in, the first circulation framework of acquisition is [6[6] [9]], can show that so the first circulation framework of acquisition should be [8[8] [12] in the time that 144 processes are participated in (not actual motion)].
In general, when this k the first circulation framework is when identical, also generally can there is some rule in the number of times of circulation, the object of embodiment of the present invention is exactly in order to find rule wherein, to can draw the logical organization of the circulation framework under more extensive service condition.
Step S604, judges the first circulation framework whether not comparing in addition in described third communication trail file, is repeated execution of steps S602 and step S603, until all the first circulation framework in completeer described third communication trail file; Otherwise form the logical organization of described circulation framework.After the first circulation framework with maximum communication number of times taking out is completeer, repeated execution of steps S602, in multiple the first circulation frameworks that do not compare, take out the first circulation framework of maximum communication number of times, the the first circulation framework taking out when performing step S602 for the first time, while performing step S602 for the second time, take out the first circulation framework namely with inferior maximum communication number of times, by that analogy, until completeer all the first circulation frameworks.It should be noted that, in execution step S602, in the time thering is the first circulation framework of maximum communication number of times and be greater than 1, take out preceding the first circulation framework that puts in order in described third communication track.
Wherein, step S603b, taking the first circulation framework of maximum process number as basis, merges cycle index and specifically comprises:
Step S701, calculates the second circulation framework with maximum communication number of times in the first circulation framework of described maximum process number, and with other the first circulation frameworks in there is maximum communication number of times the second circulation framework relatively and merge cycle index.If the first circulation framework difference that k operation produces, object from expanding so, the first circulation framework that k maximum that time of participation process number in service obtains should be to approach issuable the first circulation framework of expansion most, therefore taking the first circulation framework of maximum process number as basis, merge cycle index.For example, the the first circulation framework obtaining while participation in 16 processes participations, 32 process participations, 64 processes is respectively [12], [5], [16], follow because it is irregular, the first circulation framework in the time need to showing that 144 processes participate in so, the first circulation framework obtaining in the time that 64 processes participate in merges cycle index as basis.
Step S702, repeat above-mentioned steps, the first circulation framework entreme and mean ratio of described maximum process number the second circulation framework is compared and merges cycle index, until all the second circulation framework in the first circulation framework of completeer described maximum process number.
Step S701 compares and merges the process of cycle index, can refer step S603.
Can see that the principle that said method is followed thinks that the circulation framework in different communication track with peak volume has mutual corresponding relation.This meets the reasoning to source program, produces always identical one section of code of code of peak volume (calculated amount).
Described circulation framework refers to the loop nesting information (not comprising cycle index) in communication track.Cycle index refers to the number of times that loop body need to repeat.For identical program, operational process is identical, and the circulation framework of communication track can be same or similar.If program, in the time that the circulation framework that repeatedly moves acquisition is different, can obtain by said method the logical organization of circulation framework.
For merging cycle index, find exactly the rule that cycle index changes along with the variation of process number.Different application has different Changing Patterns, generally speaking, application all can be used the process number (for example: natural square, 2 index etc.) with certain rule, therefore cycle index also can followed certain Changing Pattern, particularly comprises the circulation of the main traffic of application (calculated amount).In the present embodiment, list all common Changing Patterns, cycle index is contrasted successively, the rule that finds cycle index to change with process number.By the logic of circulation framework and cycle index, can determine the general frame of application, in framework, be communication track and the calculating track of application.
In order to make full use of computational resource, concurrent program in operational process for all processes provide data--produce communication.Generally all processes are all participated in communication and can be formed communication block.And also having trickleer structure in communication block inside: multiple processes intercom mutually, and do not communicate by letter with other processes, and these trickleer communication block structures are referred to as micro-communication block; Between micro-communication block internal process, there is again more small structure, may have several processes to there is continuous communiction structure, be called Serial Communication.
Continuous communiction refers to that n process sends data to n+1 process, and n+1 process receives data and sends data to n+2 process, and n+2 process receives data and sends data to n+3 process ... form communication transmittance process.For example: in NPB, MG test procedure is under C calculating scale, in the communication track participated in of 64 processes, certain communication block is as shown in figure 11: whole communication block, be divided into four micro-communication block, be respectively micro-communication block 101, micro-communication block 102, micro-communication block 103, micro-communication block 104, in each micro-communication block, comprise 16 processes with continuous process number; Each micro-communication block can further form again four Serial Communications.
The logic of described formation polyalgorithm event represents also to comprise the logical organization that forms communication block, in order effectively to represent the rule of communication block, can adopt multiple logical organization to represent:
(1) first logical organization comprises: in every Serial Communication of communication block number of communications all equate and all processes all participate in communication and every row in adjacent processes difference equate.Can pass through expression formula P0F[P1, P2, P3] F_ represents described the first logical organization, wherein,
P0 represents first process number of this communication block;
P1 represents in a Serial Communication, head and tail process number whether identical (for example can represent that head and tail process number is identical with " 1 ", represent that with " 0 " head and tail process number is not identical);
P2 represents in a Serial Communication, the difference of adjacent processes number;
P3 represents in a Serial Communication, the not quantity of process repeats;
When F represents that different rows is relatively, the number of corresponding process poor, F=(Fi) ... (F2) (F1);
F_ represents that [P1, P2, P3] needs the number of times repeating, F=*N1*N2...*Ni;
I is natural number, represents the number of times that the expression formula of described the first logical organization need to be launched.
Wherein, (Fi) ... (F2) (F1) and * N1*N2...*Ni are a kind of representation, do not have the meaning of concrete logical operation.To any 1≤p≤n, the expression formula before Np repeats Np time, and the difference of the process number of the first process of each repetition is Fp.F appearance corresponding to F_, for example: when F=(F2) (F1), F=*N1*N2, and (F1) corresponding * N1, (F2) corresponding * N2.Expression formula P0F[P1, P2, P3] F_ need to launch just can be reduced into original communication block logical organization by outer and nexine layer, particularly, first launch the ground floor of (Fi) corresponding * Ni in conjunction with P0 ..., until launch last one deck of (F1) corresponding * N1, finally each expression formula of launching is converted, obtain original communication block logical organization.
The example that expands into to expression 0 (16) (1) [1 ,-4,4] * 4*4:
First this expression formula is launched to ground floor, becomes following 4 expression formulas:
0(1)[1,-4,4]*4
16(1)[1,-4,4]*4
32(1)[1,-4,4]*4
48(1)[1,-4,4]*4
Then respectively above-mentioned 4 expression formulas are launched, are become expression formula below:
Finally, to above-mentioned 16 expression formulas conversion, be just reduced into original communication block logical organization:
For example described the first logical organization is elaborated below.
Example 1, supposes that in NPB, BT test procedure is under C calculating scale, and the expression formula of same communication block in the time that 16,36,64 process numbers participate in operation is respectively:
Therefore, can draw rule, the relation between the expression of this communication block and the process of participation number is:
0 (sqrt) [1,1, sqrt] * sqrt, wherein sqrt represents the square root of process number.
Example 2, in NPB, CG test procedure is under C calculating scale, and the expression formula of three corresponding communication block in the time that 16,32,64 process numbers participate in operation is respectively:
Therefore, can show that this comprises multiple communication block and with the rule of process number variation is:
Wherein, P '=P/2, P represents process number, until penultimate parameter is not more than parameter last.
Through such processing, the each communication block in communication track is expressed as the expression formula taking communication process number as parameter, can adapt to meet arbitrarily process and count the communication of Changing Pattern and represent.
(2) second logical organizations comprise: in every Serial Communication of communication block number of communications all equate and all processes all participate in process number in communication and every row according to etc. difference arrange, but the position of minimum process number can change; Described the second logical organization can be passed through expression formula <H0|S>{H1, and H2}G represents, wherein,
H0 represents process number minimum in a Serial Communication;
H1 represents the process number whether identical (for example can represent that head and tail process number is identical with " 1 ", represent that with " 0 " head and tail process number is not identical) that first row is communicated by letter with last row;
H2 represents according to the difference between the adjacent processes rearranging from big to small number;
S represents the position at minimum process number place;
G represents { H1, the number of times that H2} need to repeat, G=*M1*M2...*Mj;
J is natural number, represents the number of times that the expression formula of described the second logical organization need to be launched.
* M1*M2...*Mj is equally also a kind of representation, does not have the meaning of concrete logical operation.The G is here similar to the F in the first logical organization, also needs according to the number of times expanded expression layer by layer repeating.
(3) User-defined logic structure.Because interapplication communications pattern is varied, can not the new communication mode of precognition user.Therefore, reserve user interface, can, by user oneself definition, the communication block of processing, reduce, find communication rule.Communication block process is with up conversion, and each large communication block becomes the expression formula of a reflection communication rule, and the numeral in expression formula and process number have certain contacting; On the other hand, also have certain Changing Pattern between communication block, ensuing task finds these contacts and rule exactly.
For example: in every Serial Communication, number of communications all equates, part process participates in (and the process number of participating in communication is regular) in communication block, can each process by this process the positional representation in all processes.For instance, if Serial Communication 21 > 23 > 21 that participate in for 64 processes, be expressed as (22/64) > (24/64) > (22/64), known in the time that 32 processes participate in, this communication is 10 > 11 > 10.
Determine after the relation between described polyalgorithm event by step S105, finally perform step S106, relation based between described polyalgorithm event generates the destinations traffic trail file that described application program is moved in target scale, and described operation scale is less than described target scale.Particularly, if generate and have the logic of polyalgorithm event to represent, can need to configure described logic some parameter in representing according to research so in step S105, generate the destinations traffic trail file under target scale.In the present embodiment, described application program is moved specifically finger application program and in fairly large computer system, is moved (actual and off-duty) in target scale, wherein said target scale comprises the residing multiple different scales of the destinations traffic trail file expanding in advance, corresponding with foregoing operation scale.For example: by the lower operation of multiple scales (operation scale) of participating in 4,8,16 or 32 processes respectively, obtain respectively after original communication trail file, the communication track extended method providing through embodiment of the present invention, can obtain the issuable multiple described destinations traffic trail files of the lower operation of multiple scales (target scale) of participating in 64,128,256 or 512 processes respectively.Generally, described operation scale is less than described target scale, because just because of being difficult to by directly moving to obtain the communication track of large-scale application program on extensive main frame, the communication track extended method that need to adopt embodiment of the present invention to provide, on target machine, repeatedly run application on a small scale, find communication trail change rule, communication track when thereby expansion is moved on a large scale, therefore, described operation scale is greater than described target scale and just seems that practical significance is little.But, do not get rid of the situation that described operation scale is greater than described target scale.
It should be noted that, in the present embodiment, be that to identify communication pattern between each atomic event be that the grammatical relation between each atomic event is determined in example explanation, and receive with the transmission in point-to-point communication example (concrete analysis MPI Send function and MPI Recv function) the syntax of declaration event that is paired into, taking circulation and loop nesting as example explanation algorithm event, in other embodiments, described grammer event can be also other forms, for example: a unblock function (MPI Isend or MPI Irecv) or a wait function MPI Wait are an atomic event, the event forming according to corresponding relation between them (grammatical relation) is also a grammer event, equally, algorithm event can be also other forms, and for example branch is exactly a typical algorithm event.
Based on above-mentioned communication track extended method, embodiment of the present invention also provides a kind of communication track to drive analogy method, comprising:
Utilize above-mentioned communication track extended method to generate destinations traffic trail file;
To in the described destinations traffic trail file input target machine architecture simulation device generating, carry out dry run.
Taking the communication pattern of NPB test procedure logic as example, participate in number of processes by change, communication track can be extended to the random scale that meets process and count Changing Pattern.Expansion process only need be resolved communication pattern expression formula, does not need multi-process to participate in.Communication pattern is expanded to the scale that 128 (or 144) quantity process is participated in, and after expansion, the ratio of total number of communications and the total number of communications of reality as shown in figure 12.For communication pattern--interprocess communication mode, can express communication track and there is fairly regular application with the communication pattern such as rule, particularly BT, SP, CG of process number variation; For MG application, its overwhelming majority communication rule is obvious, only has the unconspicuous amount of communications of sub-fraction rule between some process few, and the communication participating in as part process in MG accounts for 0.5% of total amount of communications, can ignore.By circle logic and communication block logic, the communication track of application program can show with process number, thereby can expand to the operational process (process number need to meet Changing Pattern) of any amount process.
Based on above-mentioned communication track extended method, embodiment of the present invention also provides a kind of communication track expanding unit.Figure 13 is the communication track expanding unit structural representation that embodiment of the present invention provides, and consults Figure 13, and described communication track expanding unit comprises:
Algorithm event forming unit 20, for the original communication trail file based on running application in an operation scale and generating, formation algorithm event;
Control module 10 moves described application program for controlling described algorithm event forming unit 20, to form polyalgorithm event in multiple different operation scales; Wherein, in an operation scale, move an algorithm event of the corresponding formation of described application program; The process participating in multiple different operation scales is counted difference;
Logic Generation Unit 30, for determining the relation between described polyalgorithm event, the relation between described polyalgorithm event embodies between communication track and process number and the relation between topology between track and process of communicating by letter;
Destinations traffic trail file generation unit 40, generates for the relation based between described polyalgorithm event the destinations traffic trail file that described application program is moved in target scale; Described operation scale is less than described target scale;
Described algorithm event forming unit 20 also comprises:
Extraction unit 20a, for extracting the original communication trail file of each process of the application program of moving in an operation scale;
Parsing unit 20b, for mating the attribute of described original communication trail file Atom event, determines the grammatical relation between each atomic event, according to described grammatical relation, each atomic event tissue is formed to grammer sequence of events;
Algorithm Analysis unit 20c, for determining implication and the mutual relationship of the each grammer event of described grammer sequence of events, is configured to algorithm event according to the mutual relationship of each grammer event.
In addition, described algorithm event forming unit can also comprise the event analysis unit (not shown) being connected with described extraction unit and described parsing unit respectively, for before the attribute in the described original communication trail file Atom event of coupling, the atomic event of described original communication trail file being analyzed, extract multiple attributes of atomic event.
The concrete enforcement of described communication track expanding unit can, with reference to above-mentioned communication track extended method, not repeat them here.
Based on above-mentioned communication track expanding unit, embodiment of the present invention also provides a kind of communication track to drive simulation system, comprising: target machine architecture simulation device and above-mentioned communication track expanding unit,
Described communication track expanding unit, generates for expanding the destinations traffic trail file that described application program is moved in described target scale, and inputs to described target machine architecture simulation device;
Described target machine architecture simulation device, for carrying out dry run under the driving at described destinations traffic trail file, draws analog result.
Drive analogy method and system by above-mentioned communication track, the destinations traffic trail file of generation directly can be inputted in target machine architecture simulation device and simulated, can greatly improve the simulation precision of simulator, reduce simulated cost, and then save the development time, reduce cost of development.
To sum up, the communication track extended method that embodiment of the present invention provides and device, communication track drive analogy method and system, at least have following beneficial effect:
In multiple different operation scales, move by extracting application program the original communication trail file obtaining, adopt communication track global rule analytical approach, the trace logic of communicating by letter, correspondence between localization, the loop structure of positioning and communicating track, the rule that analyzing communication track is relevant with operation scale, thus expansion obtains the communication track of extensive operational process, to support the Computer Architecture simulation of target scale.Described communication track extended method need to not applied on real extensive host in operation, just can fast, directly generate the communication track of the target scale operation of application; Do not need analytical applications source code can obtain the communication rule of application simultaneously, effectively reduce the difficulty of analytical applications original program.
In communication track, the logical organization of communication block represents mode, is convenient to communicate by letter statement and the expansion of track, thereby can more easily embody between communication track and process number and the relation between topology between track and process of communicating by letter.
Simulate by the destinations traffic trail file input target machine architecture simulation device that described communication track extended method is generated, can improve the simulation precision of simulator, reduce simulated cost, and then save the development time, reduce cost of development.
Although the present invention with preferred embodiment openly as above; but it is not for limiting the present invention; any those skilled in the art without departing from the spirit and scope of the present invention; can utilize method and the technology contents of above-mentioned announcement to make possible variation and amendment to technical solution of the present invention; therefore; every content that does not depart from technical solution of the present invention; any simple modification, equivalent variations and the modification above embodiment done according to technical spirit of the present invention, all belong to the protection domain of technical solution of the present invention.

Claims (26)

1. a communication track extended method, is characterized in that, comprising:
Based on the original communication trail file that runs application in multiple different operation scales and generate, form respectively polyalgorithm event, the process participating in multiple different operation scales is counted difference;
Determine the relation between described polyalgorithm event, the relation between described polyalgorithm event embodies between communication track and process number and the relation between topology between track and process of communicating by letter;
Relation based between described polyalgorithm event generates the destinations traffic trail file that described application program is moved in target scale, and described operation scale is less than described target scale; Wherein,
Based on the original communication trail file that runs application in each operation scale and generate, formation algorithm event comprises:
Be extracted in the original communication trail file of each process of the application program of moving in this operation scale;
Mate the attribute of described original communication trail file Atom event, determine the grammatical relation between each atomic event, according to described grammatical relation, each atomic event tissue is formed to grammer sequence of events; Grammatical relation between described definite each atomic event comprises the communication pattern between the each atomic event of identification;
Implication and the mutual relationship of determining each grammer event in described grammer sequence of events, be configured to algorithm event according to the mutual relationship of each grammer event, and described algorithm event comprises one or more in circulation, loop nesting and branch.
2. communication track extended method according to claim 1, it is characterized in that, the attribute that is also included in the described original communication trail file Atom event of coupling is analyzed the atomic event in described original communication trail file before, extract multiple attributes of atomic event, described attribute comprises filename and the line number that title, parameter value, the described function of message passing interface function occur.
3. communication track extended method according to claim 1, it is characterized in that, after mating the attribute of described original communication trail file Atom event, generate the first communication trail file that has each process, the communication pattern concrete steps between the each atomic event of described identification are as follows:
From the described first communication trail file of each process, take out a communication track, if be all collective communication track, record all collective communication tracks;
If be all point-to-point communication track, carry out point-to-point communication grammatical analysis and complete searching of communication block;
In the time handling communication tracks all in the described first communication trail file of each process, finish the identification to the communication pattern between each atomic event.
4. communication track extended method according to claim 3, is characterized in that, described point-to-point communication grammatical analysis comprises:
Take out the trace information comprising in described point-to-point communication track, described trace information comprises type of message, originating process number, target process number, message identifier, communication domain;
Judge and in temporary trace information, whether have the trace information of pairing with it, if existed, record pairing situation, if there is no keep in the trace information of described point-to-point communication track.
5. communication track extended method according to claim 4, is characterized in that, described in complete searching of communication block and comprise:
Whether search the communication track of the process of pairing with it for every communication track in temporary, check can form to send to receive pairing, be to record described pairing situation, otherwise the trace information that temporary this communication track comprises;
For incomplete communication block is searched the communication track of corresponding process, and carry out described point-to-point communication grammatical analysis;
Judging that all incomplete communication block receive pairing with the new transmission forming and whether formed communication block, is to record described communication block;
Judging whether also to have incomplete communication block, is to continue communication block to search, otherwise finishes searching communication block.
6. communication track extended method according to claim 5, is characterized in that, the described transmission that judges all incomplete communication block and new formation receives to match whether formed communication block, is to record described communication block to comprise:
The new transmission forming is received to pairing to be added in incomplete communication block;
In the time that all processes have all been participated in the communication of one or many, be judged as and formed a communication block;
Record the communication block of described formation;
Repeat above-mentioned steps, receive pairing until handle the transmission of all new formation.
7. communication track extended method according to claim 3, it is characterized in that, in described definite described grammer sequence of events, implication and the mutual relationship of each grammer event comprise the circulation law of finding using communication module as loop body, described in being contained in, described communication module completes in the second communication trail file of searching rear generation of communication block, described communication module comprises at least one communication block, described circulation law comprises the cycle index of circulation framework and loop body, and described discovery circulation law concrete steps are as follows:
Determine and in all loop bodies, there is minimum communication block quantity len; Wherein len is natural number;
Find out the circulation using len communication block as loop body;
Merge the described circulation of finding out, upgrade described second communication trail file;
Repeat successively above-mentioned steps, until not circulation in described second communication trail file generates the third communication trail file with described circulation law.
8. communication track extended method according to claim 7, is characterized in that, described in find out using len communication block and comprise as the circulation of loop body:
From described second communication trail file, find out the communication module of all len of comprising communication block, from the 1st communication module wherein, two more adjacent communication modules successively, until n communication module is when different from n+1 communication module, find a circulation using len communication block as loop body, the cycle index of described loop body is n;
Continue two more adjacent successively communication modules since n+1 communication module, until completeer all communication modules that comprises len communication block; Wherein, n is natural number and n>1.
9. communication track extended method according to claim 8, is characterized in that, described more adjacent two communication modules successively comprise:
The communication block of correspondence in two more adjacent communication modules successively, if when m communication block is different from m communication block in n+1 communication module in n communication module, be judged as n communication module different from n+1 communication module; If communication block corresponding in these two communication modules is all identical, be judged as n communication module identical with n+1 communication module; Wherein, m is natural number and m≤len.
10. communication track extended method according to claim 7, it is characterized in that, relation between described definite described polyalgorithm event comprises that the logic that forms described polyalgorithm event represents, the logic of described formation polyalgorithm event represents to comprise the logical organization of formation circulation framework, and concrete steps are as follows:
Step S601, k described third communication trail file and k circulation framework thereof when moving k described application program and obtaining different process numbers and participate in, each circulation framework comprises at least one first circulation framework; Wherein k is natural number and k>1;
Step S602 takes out respectively 1 first circulation framework with maximum communication number of times from each described third communication trail file, obtains k the first circulation framework of corresponding k circulation framework;
Step S603, relatively whether this k the first circulation framework be identical, if identical, merges the cycle index of corresponding the first circulation framework; If different,, taking the first circulation framework of maximum process number as basis, merge cycle index;
Step S604, repeated execution of steps S602 and step S603, compare described third communication trail file entreme and mean ratio the first circulation framework, until completeer the first all circulation framework;
Wherein, in execution step S602, in the time thering is the first circulation framework of maximum communication number of times and be greater than 1, take out preceding the first circulation framework that puts in order in described third communication track.
11. communication track extended methods according to claim 10, is characterized in that, each the first circulation framework comprises at least one second circulation framework, and described the first circulation framework taking maximum process number is basis, merge cycle index and comprise:
By in the first circulation framework of described maximum process number, have in the second circulation framework of maximum communication number of times and other the first circulation frameworks, there is maximum communication number of times the second circulation framework relatively and merge cycle index;
Repeat above-mentioned steps, the first circulation framework entreme and mean ratio of described maximum process number the second circulation framework is compared and merges cycle index, until all the second circulation framework in the first circulation framework of completeer described maximum process number.
12. communication track extended methods according to claim 10, it is characterized in that, the logic of described formation polyalgorithm event represents also to comprise the logical organization that forms communication block, the logical organization of described communication block is the first logical organization, described the first logical organization comprises: in every Serial Communication of described communication block, number of communications all equates, and all processes are all participated in communication, and in every row, adjacent processes difference equates; With expression formula P0F[P1, P2, P3] F_ represents described the first logical organization, wherein,
P0 represents first process number of this communication block;
P1 represents that, in a Serial Communication, whether head and tail process number is identical;
P2 represents in a Serial Communication, the difference of adjacent processes number;
P3 represents in a Serial Communication, the not quantity of process repeats;
When F represents that different rows is relatively, the number of corresponding process poor, F=(Fi) ... (F2) (F1);
F_ represents that [P1, P2, P3] needs the number of times repeating, F_=*N1*N2 ... * Ni;
I is natural number, represents the number of times that the expression formula of described the first logical organization need to be launched, the number of times that the expression formula that Ni represents described the first logical organization repeats during by outer and inner each expansion, and the difference of the process number of each first process repeating is Fi.
13. communication track extended methods according to claim 10, it is characterized in that, the logic of described formation polyalgorithm event represents also to comprise the logical organization that forms communication block, the logical organization of described communication block is the second logical organization, described the second logical organization comprises: in every Serial Communication of described communication block, number of communications all equates, and all processes are all participated in communication, and process number in every row according to etc. difference arrange; Described the second logical organization is with expression formula <H0|S>{H1, and H2}G represents, wherein,
H0 represents process number minimum in a Serial Communication;
Whether H1 represents that first row and last are listed as the process number of communicating by letter identical;
H2 represents according to the difference between the adjacent processes rearranging from big to small number;
S represents the position at minimum process number place;
G represents { H1, the number of times that H2} need to repeat, G=*M1*M2 ... * Mj;
J is natural number, represents the number of times that the expression formula of described the second logical organization need to be launched, the number of times that the expression formula that Mj represents described the second logical organization repeats during by outer and inner each expansion.
14. 1 kinds of communication tracks drive analogy method, it is characterized in that, comprising:
Utilize the communication track extended method described in any one in claim 1 to 13 to generate destinations traffic trail file;
To in the described destinations traffic trail file input target machine architecture simulation device generating, carry out dry run.
15. 1 kinds of communication track expanding units, is characterized in that, comprising:
Algorithm event forming unit, for the original communication trail file based on running application in an operation scale and generating, formation algorithm event;
Control module moves described application program for controlling described algorithm event forming unit, to form polyalgorithm event in multiple different operation scales; The process participating in multiple different operation scales is counted difference;
Logic Generation Unit, for determining the relation between described polyalgorithm event, the relation between described polyalgorithm event embodies between communication track and process number and the relation between topology between track and process of communicating by letter;
Destinations traffic trail file generation unit, generates for the relation based between described polyalgorithm event the destinations traffic trail file that described application program is moved in target scale; Described operation scale is less than described target scale;
Described algorithm event forming unit comprises:
Extraction unit, for extracting the original communication trail file of each process of the application program of moving in an operation scale;
Parsing unit, for mating the attribute of described original communication trail file Atom event, determines the grammatical relation between each atomic event, according to described grammatical relation, each atomic event tissue is formed to grammer sequence of events; Described parsing unit determines that the grammatical relation between each atomic event comprises the communication pattern between the each atomic event of identification;
Algorithm Analysis unit, for determining implication and the mutual relationship of the each grammer event of described grammer sequence of events, is configured to algorithm event according to the mutual relationship of each grammer event, and described algorithm event comprises one or more in circulation, loop nesting and branch.
16. communication track expanding units according to claim 15, it is characterized in that, described algorithm event forming unit also comprises the event analysis unit being connected with described extraction unit and described parsing unit respectively, for before the attribute in the described original communication trail file Atom event of coupling, the atomic event of described original communication trail file being analyzed, extract multiple attributes of atomic event; Described attribute comprises filename and the line number that title, parameter value, the described function of message passing interface function occur.
17. communication track expanding units according to claim 15, it is characterized in that, described parsing unit generates the first communication trail file that has each process after mating the attribute of described original communication trail file Atom event, and the communication pattern that described parsing unit is identified between each atomic event comprises:
From the described first communication trail file of each process, take out a communication track, if be all collective communication track, record all collective communication tracks;
If be all point-to-point communication track, carry out point-to-point communication grammatical analysis and complete searching of communication block;
In the time handling communication tracks all in the described first communication trail file of each process, finish the identification to the communication pattern between each atomic event.
18. communication track expanding units according to claim 17, is characterized in that, described parsing unit carries out point-to-point communication grammatical analysis and comprises:
Take out the trace information comprising in described point-to-point communication track, described trace information comprises type of message, originating process number, target process number, message identifier, communication domain;
Judge and in temporary trace information, whether have the trace information of pairing with it, if existed, record pairing situation, if there is no keep in the trace information of described point-to-point communication track.
19. communication track expanding units according to claim 18, is characterized in that, described parsing unit completes searching of communication block and comprises:
Whether search the communication track of the process of pairing with it for every communication track in temporary, check can form to send to receive pairing, be to record described pairing situation, otherwise the trace information that temporary this communication track comprises;
For incomplete communication block is searched the communication track of corresponding process, and carry out described point-to-point communication grammatical analysis;
Judging that all incomplete communication block receive pairing with the new transmission forming and whether formed communication block, is to record described communication block;
Judging whether also to have incomplete communication block, is to continue communication block to search, otherwise finishes searching communication block.
20. communication track expanding units according to claim 19, is characterized in that, described parsing unit judges that all incomplete communication block receive pairing with the new transmission forming and whether formed communication block, is to record described communication block to comprise:
The new transmission forming is received to pairing to be added in incomplete communication block;
In the time that all processes have all been participated in the communication of one or many, be judged as and formed a communication block;
Record the communication block of described formation.
21. communication track expanding units according to claim 17, it is characterized in that, described Algorithm Analysis unit determines that the implication of each grammer event in described grammer sequence of events and mutual relationship comprise the circulation law of finding using communication module as loop body, described communication module is contained in described parsing unit and completes in the second communication trail file of searching rear generation of communication block, described communication module comprises at least one communication block, described circulation law comprises the cycle index of circulation framework and loop body, and described Algorithm Analysis unit finds that circulation law comprises:
Determine and in all loop bodies, there is minimum communication block quantity len; Wherein len is natural number;
Find out the circulation using len communication block as loop body;
Merge the described circulation of finding out, upgrade described second communication trail file;
Repeat aforesaid operations, until not circulation in described second communication trail file generates the third communication trail file with described circulation law.
22. communication track expanding units according to claim 21, is characterized in that, described Algorithm Analysis unit is found out using len communication block and comprised as the circulation of loop body:
From described second communication trail file, find out the communication module of all len of comprising communication block, from the 1st communication module wherein, two more adjacent communication modules successively, until n communication module is when different from n+1 communication module, find a circulation using len communication block as loop body, the cycle index of described loop body is n;
Continue two more adjacent successively communication modules since n+1 communication module, until completeer all communication modules that comprises len communication block; Wherein, n is natural number and n>1.
23. communication track expanding units according to claim 22, is characterized in that, described Logic Generation Unit successively more adjacent two communication modules comprises:
The communication block of correspondence in two more adjacent communication modules successively, if when m communication block is different from m communication block in n+1 communication module in n communication module, be judged as n communication module different from n+1 communication module; If communication block corresponding in these two communication modules is all identical, be judged as n communication module identical with n+1 communication module; Wherein, m is natural number and m≤len.
24. communication track expanding units according to claim 21, it is characterized in that, described Logic Generation Unit determines that the relation between described polyalgorithm event comprises that the logic that forms described polyalgorithm event represents, the logic of described formation polyalgorithm event represents to comprise the logical organization that forms circulation framework, and the logical organization that described Logic Generation Unit forms circulation framework comprises:
K described third communication trail file and k circulation framework thereof when moving k described application program and obtaining different process numbers and participate in, each circulation framework comprises at least one first circulation framework; Wherein k is natural number and k>1;
From each described third communication trail file, take out respectively 1 first circulation framework with maximum communication number of times, obtain k the first circulation framework of corresponding k circulation framework;
Relatively whether this k the first circulation framework be identical, if identical, merges the cycle index of corresponding the first circulation framework; If different,, taking the first circulation framework of maximum process number as basis, merge cycle index;
Repeat from each described third communication trail file, to take out 1 first circulation framework not comparing with maximum communication number of times and compare and the operation merging, until completeer the first all circulation framework;
Wherein, described Logic Generation Unit takes out respectively in the operation of 1 first circulation framework with maximum communication number of times from each described third communication trail file, in the time thering is the first circulation framework of maximum communication number of times and be greater than 1, take out preceding the first circulation framework that puts in order in described third communication track.
25. communication track expanding units according to claim 24, is characterized in that, each the first circulation framework comprises at least one second circulation framework, and described Logic Generation Unit, taking the first circulation framework of maximum process number as basis, merges cycle index and comprises:
By in the first circulation framework of described maximum process number, have in the second circulation framework of maximum communication number of times and other the first circulation frameworks, there is maximum communication number of times the second circulation framework relatively and merge cycle index;
The first circulation framework entreme and mean ratio of described maximum process number the second circulation framework is compared and merges cycle index, until all the second circulation framework in the first circulation framework of completeer described maximum process number.
26. 1 kinds of communication tracks drive simulation system, it is characterized in that, comprising: the track expanding unit of communicating by letter described in target machine architecture simulation device and claim 15 to 25 any one,
Described communication track expanding unit, generates for expanding the destinations traffic trail file that described application program is moved in described target scale, and inputs to described target machine architecture simulation device;
Described target machine architecture simulation device, for carrying out dry run under the driving at described destinations traffic trail file, draws analog result.
CN201110110818.5A 2011-04-29 2011-04-29 Communication track expanding method and device, communication track drive simulation method and system Active CN102760085B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201110110818.5A CN102760085B (en) 2011-04-29 2011-04-29 Communication track expanding method and device, communication track drive simulation method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201110110818.5A CN102760085B (en) 2011-04-29 2011-04-29 Communication track expanding method and device, communication track drive simulation method and system

Publications (2)

Publication Number Publication Date
CN102760085A CN102760085A (en) 2012-10-31
CN102760085B true CN102760085B (en) 2014-10-22

Family

ID=47054550

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201110110818.5A Active CN102760085B (en) 2011-04-29 2011-04-29 Communication track expanding method and device, communication track drive simulation method and system

Country Status (1)

Country Link
CN (1) CN102760085B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103838870B (en) * 2014-03-21 2016-09-28 武汉科技大学 The news atomic event abstracting method merged based on information unit
CN107172656B (en) * 2016-03-07 2021-01-22 京东方科技集团股份有限公司 Non-blocking request processing method and device
CN113014439B (en) * 2021-04-19 2021-10-26 广州大一互联网络科技有限公司 Virtual elastic management method for data center bandwidth
CN116704559B (en) * 2023-07-28 2023-11-03 南京大学 Quantum fingerprint identification method and system based on asynchronous two-photon interference

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101650687A (en) * 2009-09-14 2010-02-17 清华大学 Large-scale parallel program property-predication realizing method
CN101661409A (en) * 2009-09-22 2010-03-03 清华大学 Extraction method of parallel program communication mode and system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101650687A (en) * 2009-09-14 2010-02-17 清华大学 Large-scale parallel program property-predication realizing method
CN101661409A (en) * 2009-09-22 2010-03-03 清华大学 Extraction method of parallel program communication mode and system

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
《一种面向大规模并行系统的分组协同检查点算法》;黄琼、尚利宏、周密、金惠华;《CNKI》;20100724;1-5 *
《并发程序调试的追踪/重演机制研究》;曾奕;《CNKI》;20050325;1-47 *
曾奕.《并发程序调试的追踪/重演机制研究》.《CNKI》.2005,1-47.
黄琼、尚利宏、周密、金惠华.《一种面向大规模并行系统的分组协同检查点算法》.《CNKI》.2010,1-5.

Also Published As

Publication number Publication date
CN102760085A (en) 2012-10-31

Similar Documents

Publication Publication Date Title
CN100476819C (en) Data mining system based on Web and control method thereof
Dávid et al. Foundations for streaming model transformations by complex event processing
CN103761080B (en) Structured query language (SQL) based MapReduce operation generating method and system
Perez et al. Ringo: Interactive graph analytics on big-memory machines
Navaridas et al. Simulating and evaluating interconnection networks with INSEE
CN112748914B (en) Application program development method and device, electronic equipment and storage medium
CN111708641B (en) Memory management method, device, equipment and computer readable storage medium
CN103336694A (en) Entity behavioral modeling assembling method and system
CN102176200A (en) Software test case automatic generating method
CN102760085B (en) Communication track expanding method and device, communication track drive simulation method and system
CN101673198A (en) Method for verifying consistency of dynamic behavior in UML model and time-sequence contract
CN101504688A (en) HLA based simulation software interaction method
Bellettini et al. Mardigras: Simplified building of reachability graphs on large clusters
Xiao et al. OpenABLext: An automatic code generation framework for agent‐based simulations on CPU‐GPU‐FPGA heterogeneous platforms
Li et al. Breaking (global) barriers in parallel stochastic optimization with wait-avoiding group averaging
CN102385511A (en) Visualization of runtime analysis across dynamic boundaries
CN105242958A (en) Virtual testing system and HLA simulation system data exchange method
CN111225034B (en) WebService-based dynamic integration method and assembly of water environment safety regulation and control model
Elouasbi et al. Deterministic rendezvous with detection using beeps
CN105302551B (en) A kind of method and system of the Orthogonal Decomposition construction and optimization of big data processing system
CN112148392A (en) Function call chain acquisition method and device and storage medium
CN110928705B (en) Communication characteristic analysis method and system for high-performance computing application
CN111143208B (en) Verification method for assisting FPGA to realize AI algorithm based on processor technology
CN107168298A (en) Ladder diagram dynamic analysis method
CN102760097B (en) Computer architecture performance simulation method and system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant