CN116302592A - Message transmission system between master core and slave core based on local memory - Google Patents

Message transmission system between master core and slave core based on local memory Download PDF

Info

Publication number
CN116302592A
CN116302592A CN202310075604.1A CN202310075604A CN116302592A CN 116302592 A CN116302592 A CN 116302592A CN 202310075604 A CN202310075604 A CN 202310075604A CN 116302592 A CN116302592 A CN 116302592A
Authority
CN
China
Prior art keywords
core
message
slave
master
queue
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310075604.1A
Other languages
Chinese (zh)
Inventor
陈虎
周鹏灵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Science & Technology Infrastructure Center
South China University of Technology SCUT
Original Assignee
Guangdong Science & Technology Infrastructure Center
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Science & Technology Infrastructure Center, South China University of Technology SCUT filed Critical Guangdong Science & Technology Infrastructure Center
Priority to CN202310075604.1A priority Critical patent/CN116302592A/en
Publication of CN116302592A publication Critical patent/CN116302592A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/546Message passing systems or structures, e.g. queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/70Software maintenance or management
    • G06F8/71Version control; Configuration management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/70Software maintenance or management
    • G06F8/76Adapting program code to run in a different environment; Porting
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Security & Cryptography (AREA)
  • Multi Processors (AREA)

Abstract

The invention discloses a message passing system between a master core and a slave core based on local memory. The system provides a universal messaging programming interface on different platforms such as an x86 microprocessor, an SW26010 processor, a heterogeneous fusion accelerator for E-level computing, and the like. Compared with the traditional unique interface programming based on the domestic high-performance many-core processor, the method has the following advantages: the programming model is simple and easy to learn, and programming difficulty is reduced; application software can be quickly migrated on different types of domestic high-performance microprocessors with only the compiler configuration modified; in the software development method, the high-performance computing software can be developed and debugged by using the model based on the x86 platform, and then the application software is transplanted to the domestic high-performance many-core processor, so that the development difficulty can be effectively reduced. These features will effectively promote the efficiency of domestic high performance computing software development and migration.

Description

Message transmission system between master core and slave core based on local memory
Technical Field
The present invention relates to the field of many-core processors, and in particular to a local memory-based message passing system between a master core and a slave core.
Background
(1) Domestic many-core processor architecture
As shown in FIG. 1, a SW26010 microprocessor (Haohuan FU, junfeng LIAO. The Sunway Taihu Light supercomputer: system and applications [ J ]. Science China Information Sciences,2016,59 (7): 1-16) contains 4 heterogeneous groups. Each heterogeneous group comprises a master core and a slave core cluster consisting of 64 slave cores, and the master frequency is 1.5GHz, as shown in fig. 2. The memory hierarchy of each heterogeneous group is the same, and the memory hierarchy consists of a heterogeneous group memory (8 GB) and a slave core local storage space. The main core has an L1 data Cache with a capacity of 32KB and an L2Cache (data and instructions) with a capacity of 256 KB. Each slave core has 64KB of local memory and 16KB of instruction store, supporting a 256-bit SIMD instruction set. The slave core may access the master memory by direct access or DMA.
Accelerator chips (Liu Sheng, lu Kai, guo Yang, liu Zhong, chen Haiyan, lei Yuanwu, sun Haiyan, yang Qianning, chen Xiaowen, chen Shenggang, liu Biwei, lu Jianzhuang. A Self-Designed Heterogeneous Accelerator for Exascale High Performance Computing [ J ]. Journal of Computer Research and Development,2021,58 (6): 1234-1237 ]) for class E high performance computing employ heterogeneous fusion architecture of CPU+GPDSP consisting of a multicore CPU and 4 GPDSP_Clusters, as shown in FIG. 3. The multi-core CPU comprises 16 FT-C662CPU cores. Each gpdsp_cluster contains 6 DSP nodes (each DSP node contains 4 DSP cores). The multi-core CPU adopts hardware to maintain Cache consistency, and comprises an L2Cache of 16 MB. GPDSP clusters use a three-level storage structure of 80MB private storage, 24MB global shared storage and 32GB HBM storage. A 64KB private scalar memory SM and 768KB private vector memory AM are included on each DSP core. The vector component consists of 16 isomorphic VPE arrays, and supports 1024-bit SIMD instruction operation at the highest.
(2) Architecture abstraction of domestic many-core processor
Taking SW26010 and an accelerator chip for E-level high-performance computing as an example, the domestic many-core high-performance microprocessor has the following characteristics:
1. the system adopts an asymmetric structure, comprises a small number of complex main cores and a plurality of simpler computing cores, wherein the main processor is responsible for processing complex logic control tasks, and the coprocessor is responsible for processing large-scale data parallel tasks with high computation density and simple logic branches.
2. Each compute core has an independent local memory space and these memory spaces do not have Cache coherency, requiring a programmer to host data exchanges with the respective compute core memory via an explicit program control system.
3. There are two methods of data exchange between the master core and the slave core: 1) The slave core directly accesses the memory space of the master core, has longer delay and is only suitable for transmitting control information; 2) From the core initiated DMA process, larger-scale data may be transferred.
4. SIMD instructions are supported from the cores, with different processors differing in SIMD width.
5. Operating system support for multiple processes (threads) is not available on the slave core, and only one thread is supported to run on the slave core. Different processors have different slave core thread programming interfaces.
These two different types of many-core processors may be described using the abstract structure depicted in fig. 4. One master core and N slave cores form a complete processor cluster. The main core passes through the on-chip Cache and accesses the main memory. The slave core has a local memory, and does not support a Cache consistency protocol, and the DMA completes data exchange between the main core memory and the slave core. Each slave core has a SIMD instruction system thereon. The data width of SIMDs varies among different processors. Table 1 gives the main architectural parameters of SW26010 and heterogeneous fusion accelerator for class E computing.
(3) Existing multi-core processor programming model
OpenMP (DE SUPINSKI B R, SCOGLAND T R W, DURANA, et al ongoing evolution of OpenMP [ J ]. Proceedings of the IEEE,2018,106 (11): 2004-2019.) is a common multi-threaded programming interface on current symmetric multiprocessor systems, and is widely supported. Applications developed based on this standard have good portability.
Cilk (Leiserson, charles E.; plaat, aspe (1998), "Programming parallel applications in Cilk". SIAM News.31.) is a task-based multithreaded parallel programming extension. On this basis, cilk++ (Leiseson C E.the Cilk++ concurrency platform [ J ]. The Journal of Supercomputing,2010,51 (3): 244-257.) C/C++ is extended in parallel using three keys-Cilk_for, _Cilk_sphn, and-Cilk_sync. The run-time application takes care of the method of dividing and controlling to schedule tasks among the working threads so as to ensure the load balance of a plurality of threads.
Intel corporation proposed an open source thread build library TBB (Threading Building Blocks) (Anonymous "Intel threading building blocks; outlining C++ for multi-core processor parallelism," SciTech Book News, vol.32, (3), 2008// REINDERS J.Intel threading building blocks: outlining c++ for multi-core processor parallelism st edition). The TBB takes tasks as a scheduling unit and has portability on POSIX and Windows thread libraries. In 2018, intel corporation has published a software programming framework for OneAPI. OneAPI aims to provide a unified programming model and application programming interface for CPU, GPU, FPGA, neural network processors, or other hardware accelerators. The core of OneAPI is the programming language of Data Parallel C++ (James Reinders et al Data Parallel C++ M. Apress, berkeley, CA,2021.Gerhard R.Joubert,Hugh Leather,Mark Parsons,Frans Peters,Mark Sawyer,Ruyman Reyes,Victor Lom u, ller. SYCL: single-source C++ accelerator programming [ J ]. Advances in Parallel Computing,2016,27 ]), which is essentially an extension of C++, increases support for SYCL programming models, can support Data Parallel and heterogeneous programming across CPUs and accelerators to simplify programming and improve code reusability across different hardware, while enabling tuning according to specific accelerators.
Taking SW26010, a heterogeneous fusion accelerator for E-level computing as an example, the SW26010 many-core processor provides a set of Athread function libraries that create and manage threads, one slave core per thread binding. In Athread, the main core interface part is responsible for controlling operations such as creation reclamation of threads, thread scheduling control, interrupt exception management, asynchronous mask support, and the like. The slave core interface part is responsible for initiating data transmission, executing core calculation, thread identification, interrupt sending and other operations.
An hthread multithreaded programming interface is used on the heterogeneous fusion accelerator for class E computing. The programming interface comprises a master core end programming interface and a slave core equipment end programming interface, wherein the host end programming interface mainly comprises equipment management, mirror image management, thread management, equipment end storage management and equipment end shared resource management; the equipment-side programming interface mainly comprises a parallel management interface, a DSP on-chip storage management interface, synchronous management, a terminal/exception handling function interface and a vectorization function interface.
In summary, improving portability of application software on different hardware platforms has become a major working direction of the international high-performance software programming model. But the architecture and the operating system of the domestic high-performance many-core microprocessor have the characteristics of the architecture and the operating system, are difficult to directly use the existing programming model, are mutually not universal, and seriously obstruct the development of domestic high-performance software. Meanwhile, the current SIMD programming model also has some problems, such as OpenMP and Cilk++ require compiler version support, MAL only supports macros in part of ISA, and cannot be used for domestic many-core processors, a vc library and gSIMD method encapsulate SIMD instructions, a user does not directly operate vector instructions, the library fills vector width, and the supported instruction set is also very limited.
Disclosure of Invention
The SW26010 and the heterogeneous fusion accelerator facing E-level computing which are independently developed in China are high-performance many-core processors, a small number of main cores and a plurality of auxiliary cores are adopted, the auxiliary cores adopt local memories without Cache consistency, and the local memories are greatly different from the traditional SMP (symmetrical multiprocessor) and CC-UMA (Cache consistency unified memory access) structures. Meanwhile, interfaces such as thread use on a slave core, local memory data transmission and the like are unique to each processor, and have great differences from the international common standard. This directly leads to two problems: 1) The interface of the bottommost layer is needed to be directly used for developing domestic high-performance many-core processor software, and the software is generally difficult to develop because the software can only be remotely connected to the super computing center for debugging; 2) The domestic software on different domestic high-performance many-core processors cannot be used universally, so that the research and development forces of the domestic software which is originally very weak are more dispersed, and a lot of repeated development work is caused.
The invention provides a message transfer system between a main core and a slave core based on a local memory, which provides a general method for message transfer between the main core and the slave core, and provides a general message transfer programming interface on different platforms such as an x86 microprocessor, an SW26010 processor, a heterogeneous fusion accelerator facing E-level computation and the like. Compared with the traditional unique interface programming based on the domestic high-performance many-core processor, the method has the following advantages: 1) The programming model is simple and easy to learn, and programming difficulty is reduced; 2) Application software can be quickly migrated on different types of domestic high-performance microprocessors with only the compiler configuration modified; 3) In the software development method, the high-performance computing software can be developed and debugged by using the model based on the x86 platform, and then the application software is transplanted to the domestic high-performance many-core processor, so that the development difficulty can be effectively reduced. These features will effectively promote the efficiency of domestic high performance computing software development and migration.
The object of the invention is achieved by at least one of the following technical solutions.
Message transmission system between master core and slave core based on local memory, including master core set M, respectively denoted as M 1 ,…,m |M| Where |M| represents the number of master cores in master core set M; main core m i Corresponding to one or more slave core sets S a And satisfy |S a |=|S b |,|S a The I represents the slave core set S a The number of secondary cores in the process is 1-a, and b-M;
a master core m via a slave core thread management interface i Slave core set S may be managed i ,1≤i≤|M|;
Wherein, in the creation of the ith main core m i To the ith slave core set S i The j-th slave core s in (a) i,j Is the kth message queue q i,j,k In which,s i,j ∈S i ,1≤j≤|S i I, 1 is less than or equal to k, and a calling interface can be utilized in a main core m i And slave core s i,j Creates a corresponding message queue q in the memory of the computer system i,j,k All master cores m i To slave core s i,j Is a message queue q of (2) i,j,k Constitutes the set Q, Q i,j,k E Q, complete master core m i And slave core s i,j The connection between the two;
main core m i Or from core s i,j A series of messages r are sent by a message sending mechanism x And transmitted to message queue q i,j,k In the process, a message sequence set R is obtained, and the messages in the message sequence set R are orderly sent, wherein x and R are not less than 1 x ∈R;
Slave core s i,j Or main core m i According to message queue q i,j,k Selecting a corresponding message R from the sequence of messages R x Wherein r is x E, R,1 is less than or equal to x is less than or equal to |R|, and the user acquires the message R x Complete the self-defining message r x After processing of (a), message queue q i,j,k Release the message r x The memory used;
slave core s i,j Logging off slave cores s after processing data i,j Is cached in the master core m i Run thread reclamation slave core s i,j And continue processing master core m i If there is no task, the main core m i And logging off the cache, and ending the program in parallel by multiple threads.
Further, creating a message queue on the primary core requires specifying the following parameters:
the message queue name qName of the character string type, the slave core number slave ID connected, the message size msgSize, the message quantity mSize contained in the master core part of the message queue, the message quantity sSize contained in the slave core part, the starting address mQaddr of the master core message queue, the memory type sType occupied by the message queue in the slave core and the direction of the message queue; after the call is successful, a handle number handle is returned;
wherein the master core identifies a queue entity with (slave core number, handle number) or (slave core number, queue name); the slave core takes the handle number or the queue name as the unique identification number of the queue to determine a unique queue entity; the handles of the same queue on the master core are the same as the handles on the slave cores;
The message queue is only used for communication between the master core and the slave core, and the user can specify the slave core slave ID where the queue is located; between a pair of the master core and the slave core, a plurality of cores may be provided; different message queues;
the size of each message in the message queue is not greater than msgSize bytes;
a message queue is distributed in a main core memory and a local memory of a slave core, and the number of the messages held by the main core memory and the local memory is mSize and sSize respectively;
the initial address of the message queue on the main core memory is a continuous memory space designated by an application program, and the initial address is mQaddr;
if the local memories on the slave cores are of different types, the type of local memory occupied by the message queue may be specified by the slave core memory type sType;
the message queue adopts one direction, and is divided into a main core writing/reading direction and a slave core writing/reading direction, and the directions are specified by the direction parameters;
the master core can create a plurality of message queues between the master core and one slave core, and the message queues between the master core and all the slave cores form a message queue set;
the master core completes the control of the slave core thread according to the slave core thread management interface, mainly creates and starts a slave core thread group for the interface, waits for the thread group to terminate, closes the thread group and loads an image file to the device by the master core.
Further, a message queue has a continuous storage space for storing message contents in a master core part and a slave core part, the number of messages which can be contained in the message queue is divided into mSize and sSize, and the occupied memory capacity is mSize×msgSize bytes and sSize×msgSize bytes respectively; the capacity of the message queue from the core portion is limited by the capacity of the local memory;
the control information layout of each message queue is divided into two parts: a status list and a location index;
the position index is divided into: IMTran, IMReady, IMLocked and IMIdle associated with the master core location, ISTran, ISReady, ISLocked and ISIdle associated with the slave core location; according to different message queue directions, different designs are also provided, and in the message queue control information layout of the master core to the slave core, IMLocked and IMIdle are stored in a master core address area; IMTran, IMReady and the remaining 4 location indices are located in the slave core local memory; while IMReady, IMLocked and IMIdle are stored in the master core address area in the message queue control information layout sent from the slave core to the master core; the IMTran and the other 4 position indexes are both located in the slave core local memory;
IMTran indicates that the first message block state in the main core space is the message position index in transmission; IMReady represents the message location index in the main core space where the first message block state is ready for a message; IMLocked represents the first message block state in the main core space as the message position index in the message lock; IMIdle indicates that the first message block state in the main core space is the message position index in the message idle;
ISTran indicates that the first message block state in kernel space is the message location index in transmission; ISReady represents the message location index that is ready for a message from the first message block state in the kernel space; ISLocked denotes the message location index from the first message block state in core space in message lock; ISIdle indicates that the first message block state in the kernel space is the message position index in the message idle;
each state in the state list corresponds to each message block in the annular message block data area one by one; the message block state list of the master core part and the message block state list of the slave core part are respectively marked as MState and SState and are respectively positioned in a master core address area and a slave core local memory;
a message queue is divided into a master core part and a slave core part;
when the message queue is created, the number of messages that the master core portion and the slave core portion can accommodate is already determined;
the position index in the message queue control information layout can have different designs according to different message queue directions, and the unnecessary variables of the master core are placed in the slave core for storage, so that the access of the slave core code to the variables of the master core can be reduced, and the performance of the model is improved.
Further, in the message queue from the master core to the slave core, the state of a message block in the master core portion includes: masterIdle, masterLocked, masterReady, MTransferring; a message block includes SlaveIdle, STransferring, slaveReady, slaveLocked in the state of the slave core portion; the status information of each message block is stored in a respective memory;
After the message queue is created, all message blocks of the master core part are in a MasterIdle state, and all message blocks of the slave core part are in a SlaveIdle state;
MasterIdle indicates that the message block in the main core is in an idle and allocable state, masterLocked indicates that the message block in the main core is in a locking state, masterReady indicates that the message block in the main core is in a ready and available state, and MTransferring indicates that the message block in the main core is in a transmission state;
SlaveIdle indicates that the message block in the slave core is in an idle and allocable state, STransferring indicates that the message block in the slave core is in a transmission state, slaveReady indicates that the message block in the slave core is in a ready and usable state, and SlaveLocked indicates that the message block in the slave core is in a locking state;
the interface provided by the local memory-based message passing system between the master core and the slave core for the master core application program comprises:
m1, mAllocateMsg (), obtaining an address of a message block in a message queue main core part;
m2, mshendmsg (), starting the master core to transfer a message to the slave core;
m3, mRecvMsg (), receives a message sent from the core;
m4, mReleaseMsg (), releasing a message of a main core part;
the message queue system provides interfaces for the slave core application program including:
S1, sRecMsg (), and receiving a message sent by a main core;
s2, sReleaseMsg (), releasing a message from the core part;
s3, sAblocateMsg (), and obtaining an address of a message block from the core part in a message queue;
s4, sSendMsg (), and starting the slave core to transmit a message to the master core;
in the above interface, M1, M2, S1 and S2 are used for the master core to transfer messages to the slave core, and M3, M4, S3 and S4 are used for the slave core to transfer messages to the master core.
Further, the sequence of operations for sending a message from the master core to the slave core includes:
a1, calling mAllocateMsg (); based on a message transmission system between a main core and a slave core of a local memory, distributing an idle message block pointed by a position index IMIdle in a main core part of a message queue, setting the block to be in a MasterLocked state, circularly moving the IMIdle, and returning the block address MasterMsg to a main core application program;
a2, the master core application program sets a message to be sent in an idle message block pointed by the MasterMsg;
a3, the master core application program calls mSendMsg (), obtains a first message block MasterMsg pointed by the position index IMLocked, sets the message block MasterMsg to be in a MasterReady state based on a message transfer system between the master core and the slave core of the local memory, and circularly moves the IMLocked;
a4, the message transmission system between the main core and the slave core based on the local memory allocates an idle message block message storage space SlaveMsg pointed by a position index ISIdle for the message block to be transmitted in the slave core part at a set time, and circularly moves the ISIdle; starting DMA to transmit a message block in the MasterMsg to SlaveMsg, acquiring a first message block MasterMsg pointed by a position index IMReady, setting the message block MasterMsg to be in an MTransferring state, and setting the SlaveMsg to be in the STransferring state; after the DMA transmission is finished, the message block SlaveMsg pointed by the slave core message block position index ISTran is set to be in a SlaveReady state by the message transfer system between the master core and the slave core based on the local memory, and the message block MasterMsg pointed by the master core message block position index IMTran is set to be in a MasterIdle state;
a5, calling sRecvMsg (); the message queue returns a message block SlaveMsg which is pointed to by a position index ISReady of the slave core part to the slave core application program, and the message block SlaveMsg is set to SlaveLocked;
a6, reading the content in the SlaveMsg from the core application program;
a7, calling sReleaseMsg (); the message queue sets slave core message block SlaveMsg to a SlaveIdle state;
The sequence of operations for the slave core to send a message to the master core includes:
b1, calling sAblocateMsg (); based on the message transmission system between the main core and the auxiliary core of the local memory, distributing an idle message block pointed by a position index ISIdle in the auxiliary core part of the message queue, setting the block to be in a SlaveLocked state, circularly moving the ISIdle, and returning the block address SlaveMsg to the main core application program;
b2, setting a message to be sent in an idle message block pointed by SlaveMsg from the core application program;
b3, calling sSendMsg (); the method comprises the steps that a first message block SlaveMsg pointed by a position index ISLocked is obtained, and the message block SlaveMsg is set to be in a SlaveReady state based on a message transfer system between a master core and a slave core of a local memory; circularly moving the ISLocked;
b4, distributing an idle message block message storage space MasterMsg pointed by a position index IMIdle for a message block to be transmitted in a slave core part at a set time based on a message transfer system between a master core and a slave core of the local memory; starting DMA to transfer the message block in the Slave Msg to the MasterMsg, setting the message block MasterMsg to be in an MTransferring state, and setting the Slave Msg to be in an STransferring state; after the DMA transmission is finished, a message transfer system between a main core and a slave core based on a local memory sets a MasterMsg pointed by a main core message block position index IMTran to be in a MasterReady state, and sets a SlaveMsg pointed by a slave core message block position index ISTran to be in a SlaveIdle state;
b5, calling mRecvMsg (); the message queue returns a message block address MasterMsg pointed by a position index IMReady which is already positioned in the main core part to the main core application program;
b6, the master kernel application program reads the content in the MasterMsg;
b7, calling mReleaseMsg (); the message queue sets the master core message block MasterMsg to be in a MasterIdle state;
the application program of the master core or the slave core directly reads and writes the content of the message block in the memory area managed by the message queue, and the message content is not required to be moved to other memory spaces; therefore, the data moving expense of the message content can be reduced, and the use amount of the secondary core local memory can be effectively reduced;
the master core or the slave core application merely initiates the transmission of the message or receives the message without regard to the specific implementation of the transmission of the message between the master core and the slave core; the realization of message transmission is completed by a message transmission system between a master core and a slave core based on local memory; on the one hand, the application program design is simplified, and meanwhile, the application program has better portability.
Further, the blocking type message transmission process between the master core and the slave core is specifically as follows:
Message queues will maintain a set of DMA requests DMAReqs in each message queue; the set is initialized to an empty set;
the application calls the interface sRecvMsg (); in sRecvMsg (), the following steps are performed:
A1. judging whether a DMA request set DMAReqs of the message queue is empty, if so, executing the step A2, otherwise, executing the step A3;
A2. checking each request req in DMAREqs in sequence, checking whether the request req completes DMA, ignoring if not, setting a req.SMsg state as SlaveReady if complete, setting a req.MMsg state as MasterIdle, and removing req in the DMAREqs;
A3. judging whether the slave core part can acquire the message block SMsg in the SlaveIdle state, and if so, executing the step A4, otherwise, directly executing the step A5;
A4. setting the state of the message block corresponding to the MMsg as an MTransferring state, setting the state of the message block corresponding to the SMsg as an STransferring state, starting an asynchronous DMA request with the length of MsgSize bytes from the MMsg to the SMsg, adding req into DMAREqs, and executing the step A3 again;
A5. If the message in the core part message is in the SlaveReady state, setting the earliest SlaveReady state message Msg to be in a SlaveLosed state, returning the Msg to the end of the application program, otherwise, executing the step A1;
wherein the DMA request set DMAReqs is initialized to null;
the slave core application calls the interface sssendmsg (); in the sSendMsg (), the following steps are performed:
B1. judging whether a DMA request set DMAReqs of the message queue is empty, if so, executing the step B2, otherwise, executing the step B3;
B2. checking each request req in DMAREqs in sequence, checking whether the request req completes DMA, ignoring if not, setting req.SMsg state as SlaveIdle if complete, setting req.MMsg state as MasterReady, and removing req in DMAREqs;
B3. judging whether the slave core part can acquire the message block SMsg in the SlaveReady state, and if so, executing the step B4, otherwise, directly executing the step B5;
B4. setting the state of the message block corresponding to the MMsg as an MTransferring state, setting the state of the message block corresponding to the SMsg as an STransferring state, starting an asynchronous DMA request with the length of MsgSize bytes from the MMsg to the SMsg, adding req into DMAREqs, and executing the step B3 again;
B5. If the message sent from the core part at this time is in a SlaveLocked state, setting the Msg of the SlaveLocked state message as a SlaveReady state, returning the Msg to the end of the application program, otherwise, executing the step B1;
wherein the DMA request set DMAreqs is initialized to null.
Further, the memory space of the slave core accessing the master core has two different modes of direct access and asynchronous DMA transmission; the direct access mode has low efficiency and is suitable for small amount of data access; the asynchronous DMA transmission mode comprises two steps of starting a DMA transmission process and inquiring a DMA result; after starting DMA transmission, the software system finishes other works without waiting for the end of DMA, and knows whether the DMA is finished or not by inquiring the DMA result;
in the process of blocking the main core from sending/receiving the message, the message is returned only after the slave core receives the message of the main core, otherwise, the message is always waited for being sent by the main core;
when the slave core receives the message, the slave core starts a DMA transmission process of the message in the master ready state in the master core part; when the slave core has two or more message blocks and the speed at which the master core transmits messages is higher than the speed at which the slave core uses messages, it is possible to achieve that the slave core application program reads messages and the DMA transfer process is completed in parallel.
Further, the message queue is created by the master core, and a new queue handle is generated in both the master core and the slave core; in the aspect of a master core, handles are set according to different slave core number partitions, and in the aspect of the slave core, a unique queue entity can be determined by a handle number handle or a queue name qName; the handles of the same queue on the master core include handles on the slave cores, that is, the queue corresponding to the slave core (slave id) and the queue of which the handle corresponding to the slave core (slave id) is the same queue entity; the state of a specific message queue can be inquired through the identification number handle of the message queue, and the state mainly comprises the existence, the direction, the size and the number of the messages in the current queue;
the interface provided by the local memory-based message passing system between the master core and the slave core for the master core application program comprises:
m5, mQueryQueue (), inquiring whether a message queue exists;
m6, mQueueDirection (), obtaining the queue direction of the message queue;
m7, mQueueMsgNumInMaster (), obtaining the number of messages which can be accommodated by a control core part of a message queue;
m8, mQueueMsgNumInSlave (), obtaining the number of messages which can be accommodated by a computing core part of a message queue;
M9, mQueueMsgSize (), obtaining the maximum byte number of each message in the message queue;
m10, mQueueMsgSlaveMemTYpe (), acquiring a memory type of a slave core part in a message queue;
m11, mQueueMsgNumStatus (), obtaining the dynamic information of the message queue;
m12, mCreateQueue (), creating a message queue;
the interface provided by the local memory-based messaging system between the master and slave cores for the slave core application includes:
s5, sQueryQueue (), inquiring whether a message queue exists;
s6, sQueueDirection () is carried out to obtain the queue direction of the message queue;
s7, sQueueMsgNumInMaster (), obtaining the number of messages which can be accommodated by a control core part of a message queue;
s8, sQueueMsgNumInSlave (), acquiring the number of messages which can be accommodated by a computing core part of a message queue;
s9, obtaining the maximum byte number of each message in a message queue;
s10, sQueueMsgSlaveMemTYpe (), acquiring a memory type of a slave core part in a message queue;
s11, sQueueMsgNumStatus (), obtaining dynamic information of a message queue;
the interfaces M5-M12 are used for inquiring the related message queue information on the master core, and the interfaces S5-S11 are used for inquiring the related message queue information on the slave core.
Further, when a user creates a message queue, a dedicated message queue handle is generated, and the unique message queue can be obtained through a handle number or a queue name;
the corresponding state information of the message queue can be acquired in the aspects of the master core and the slave core; the master core may determine a unique queue entity because it is to communicate with multiple slave cores (slave core number, handle number) or (slave core number, queue name); in the slave core, the handle number handle or queue name qName may determine the unique queue entity.
Further, the following interfaces are arranged on different high-performance many-core processors, which cover the steps required by communication between a master core and a slave core, and comprise a slave core management mechanism on the master core, and the interfaces can be used for enabling codes to be quickly transplanted to various high-performance many-core processors while completing corresponding functions; when the code is transplanted to a new platform, the code is only required to be recompiled, and the compiling options of the corresponding platform are appointed during compiling;
the interface provided by the local memory-based message passing system between the master core and the slave core for the master core application program comprises:
m13, mHaltDevice, exit the run environment ();
M14, mHMess QueueInit, an initialization method ();
m15, mHMess QueueQuit, control the logging off method ();
m16, mLoadDatFile: load image file to device, only MT3 needs to use ();
m17, mUnloadDatFile, offload image files to device, only MT3 needs to use ();
m18, mGETSlaveCoreNum, namely acquiring the number of computing cores, namely controlling a cancellation method ();
m19, mGETMSize, memory size of acquisition control core and computation core, unit is byte
M20, mGETSlaveSIMDLanes, the number of channels for parallel processing of SIMD instructions to acquire a compute core
M21, mInitDevice, loading the running Environment of the acceleration device ()
M22, mTinitThreadID, acquire initialized thread data Structure
M23, mStartSlaveThread, create and initiate, bind thread group of computing core
M24, mWAITSlaveThreads waiting for thread group termination
M25, mdestroySlaveThreads: closing thread group
M26, mSlaveThreadActive, obtaining whether the thread of the computing core is active
The interface provided by the local memory-based messaging system between the master and slave cores for the slave core application includes:
s12, sHMessQueueInit (): initializing a message queue slave core;
S13, calculating a logging-off method () of the core part cache, and obtaining the queue direction of the message queue;
s14, sGetSlaveNum is used for acquiring a calculation core number (), and acquiring the number of messages which can be accommodated by a control core part of a message queue;
s15, sGetSlaveID, namely acquiring the number (), of the current computing core, and acquiring the number of the messages which can be accommodated by the computing core part of the message queue;
s16, obtaining the maximum byte number of each message in a message queue;
s17, sSIMDLanes, namely acquiring the number of channels (), processed by SIMD instructions of a computing core in parallel, and acquiring the memory type of a slave core part in a message queue;
M13-M20 is used for inquiring relevant information queue information on the main core, M21-M26 is used for managing the thread of the slave core on the main core, S12-S20 is used for inquiring relevant information queue information on the slave core;
the interface covers the functions required by the current different high-performance many-core processors, and the provided interface set I is the bottom layer interface L of the different high-performance many-core processors i 1.ltoreq.i, i= { L 1 ∪L 2 …; if there is no processor L with corresponding function 1 The up-call interface set I is a function I, i.e.,
Figure BDA0004066070880000121
the code will not have any negative impact;
By utilizing programming of macro definition, the predefined macros are set when different processors correspond to different predefined macros, the same effect is achieved by calling the same interface on different high-performance many-core processors, and the difference of the bottom libraries among different many-core processors is encapsulated.
Compared with the prior art, the invention has the advantages that:
aiming at the problem that the thread programming libraries of the domestic high-performance many-core processor are not uniform, the invention provides a slave-core thread management mechanism for controlling threads under various platforms; aiming at the problem that each slave core adopts independent memory space without Cache consistency and needs a program to explicitly control data exchange between a system main memory and each computing core memory, the model provides a message queue;
under the support of the programming model, the high-performance computing software can be developed and debugged by using the model based on the x86 platform, and then the application software is transplanted to the domestic high-performance many-core processor. Thus, not only the development difficulty can be effectively reduced, but also the same software can be quickly migrated on two different types of domestic high-performance microprocessors, and the development and migration efficiency of domestic high-performance computing software can be effectively improved
Drawings
Fig. 1 is a diagram of a single heterogeneous group in a SW26011 processor.
FIG. 2 is a schematic diagram of an accelerator chip architecture for E-level high performance computing.
FIG. 3 is an abstract schematic diagram of the architecture of a domestic high-performance heterogeneous processor.
Fig. 4 is a schematic diagram of a memory structure of a message queue and a status of a message in an embodiment of the present invention.
FIG. 5 is a diagram illustrating a control information layout of a master-to-slave direction message queue in an embodiment of the present invention.
FIG. 6 is a flow chart of an implementation of a local memory-based message passing system between a master core and a slave core in an embodiment of the invention.
FIG. 7 is a graph of password guess program performance in an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, a detailed description of the specific implementation of the present invention will be given below with reference to the accompanying drawings and examples.
Examples:
message transmission system between master core and slave core based on local memory, including master core set M, respectively denoted as M 1 ,…,m |M| Where |M| represents the number of master cores in master core set M; main core m i Corresponding to one or more slave core sets S a And satisfy |S a |=|S b |,|S a The I represents the slave core set S a The number of secondary cores in the process is 1-a, and b-M;
a master core m via a slave core thread management interface i Slave core set S may be managed i ,1≤i≤|M|;
Wherein, in the creation of the ith main core m i To the ith slave core set S i The j-th slave core s in (a) i,j Is the kth message queue q i,j,k In which s is i,j ∈S i ,1≤j≤|S i I, 1 is less than or equal to k, and a calling interface can be utilized in a main core m i And slave core s i,j Creates a corresponding message queue q in the memory of the computer system i,j,k All master cores m i To slave core s i,j Is a message queue q of (2) i,j,k Constitutes the set Q, Q i,j,k E Q, complete master core m i And slave core s i,j The connection between the two;
main core m i Or from core s i,j A series of messages r are sent by a message sending mechanism x And transmitted to message queue q i,j,k In the process, a message sequence set R is obtained, and the messages in the message sequence set R are orderly sent, wherein x and R are not less than 1 x ∈R;
Slave core s i,j Or main core m i According to message queue q i,j,k Selecting a corresponding message R from the sequence of messages R x Wherein r is x E, R,1 is less than or equal to x is less than or equal to |R|, and the user acquires the message R x Complete the self-defining message r x After processing of (a), message queue q i,j,k Release the message r x The memory used;
slave core s i,j Logging off slave cores s after processing data i,j Is cached in the master core m i Run thread reclamation slave core s i,j And continue processing master core m i If there is no task, the main core m i And logging off the cache, and ending the program in parallel by multiple threads.
Further, creating a message queue on the primary core requires specifying the following parameters:
The message queue name qName of the character string type, the slave core number slave ID connected, the message size msgSize, the message quantity mSize contained in the master core part of the message queue, the message quantity sSize contained in the slave core part, the starting address mQaddr of the master core message queue, the memory type sType occupied by the message queue in the slave core and the direction of the message queue; after the call is successful, a handle number handle is returned;
wherein the master core identifies a queue entity with (slave core number, handle number) or (slave core number, queue name); the slave core takes the handle number or the queue name as the unique identification number of the queue to determine a unique queue entity; the handles of the same queue on the master core are the same as the handles on the slave cores;
the message queue is only used for communication between the master core and the slave core, and the user can specify the slave core slave ID where the queue is located; between a pair of the master core and the slave core, a plurality of cores may be provided; different message queues;
the size of each message in the message queue is not greater than msgSize bytes;
a message queue is distributed in a main core memory and a local memory of a slave core, and the number of the messages held by the main core memory and the local memory is mSize and sSize respectively;
The initial address of the message queue on the main core memory is a continuous memory space designated by an application program, and the initial address is mQaddr;
if the local memories on the slave cores are of different types, the type of local memory occupied by the message queue may be specified by the slave core memory type sType;
the message queue adopts one direction, and is divided into a main core writing/reading direction and a slave core writing/reading direction, and the directions are specified by the direction parameters;
the master core can create a plurality of message queues between the master core and one slave core, and the message queues between the master core and all the slave cores form a message queue set;
the master core completes the control of the slave core thread according to the slave core thread management interface, mainly creates and starts a slave core thread group for the interface, waits for the thread group to terminate, closes the thread group and loads an image file to the device by the master core.
Further, a message queue has a continuous memory space for storing message contents in both the master core portion and the slave core portion, and the number of messages which can be accommodated in both the master core portion and the slave core portion is divided into mSize and sisize, and the occupied memory capacities are msize×msgsize bytes and sisize×msgsize bytes, respectively, as shown in fig. 4; the capacity of the message queue from the core portion is limited by the capacity of the local memory;
The control information layout of each message queue is divided into two parts: a status list and a location index;
the position index is divided into: IMTran, IMReady, IMLocked and IMIdle associated with the master core location, ISTran, ISReady, ISLocked and ISIdle associated with the slave core location; according to different message queue directions, different designs are also provided, and in the message queue control information layout of the master core to the slave core, IMLocked and IMIdle are stored in a master core address area; IMTran, IMReady and the remaining 4 location indices are located in the slave core local memory; while IMReady, IMLocked and IMIdle are stored in the master core address area in the message queue control information layout sent from the slave core to the master core; the IMTran and the other 4 position indexes are both located in the slave core local memory;
IMTran indicates that the first message block state in the main core space is the message position index in transmission; IMReady represents the message location index in the main core space where the first message block state is ready for a message; IMLocked represents the first message block state in the main core space as the message position index in the message lock; IMIdle indicates that the first message block state in the main core space is the message position index in the message idle;
ISTran indicates that the first message block state in kernel space is the message location index in transmission; ISReady represents the message location index that is ready for a message from the first message block state in the kernel space; ISLocked denotes the message location index from the first message block state in core space in message lock; ISIdle indicates that the first message block state in the kernel space is the message position index in the message idle;
each state in the state list corresponds to each message block in the annular message block data area one by one; the message block state list of the master core part and the message block state list of the slave core part are respectively marked as MState and SState and are respectively positioned in a master core address area and a slave core local memory;
a message queue is divided into a master core part and a slave core part;
when the message queue is created, the number of messages that the master core portion and the slave core portion can accommodate is already determined;
the position index in the message queue control information layout can have different designs according to different message queue directions, and the unnecessary variables of the master core are placed in the slave core for storage, so that the access of the slave core code to the variables of the master core can be reduced, and the performance of the model is improved.
Further, in the message queue from the master core to the slave core, the state of a message block in the master core portion includes: masterIdle, masterLocked, masterReady, MTransferring; a message block includes SlaveIdle, STransferring, slaveReady, slaveLocked in the state of the slave core portion; the status information of each message block is stored in a respective memory;
After the message queue is created, all message blocks of the master core part are in a MasterIdle state, and all message blocks of the slave core part are in a SlaveIdle state;
MasterIdle indicates that the message block in the main core is in an idle and allocable state, masterLocked indicates that the message block in the main core is in a locking state, masterReady indicates that the message block in the main core is in a ready and available state, and MTransferring indicates that the message block in the main core is in a transmission state;
SlaveIdle indicates that the message block in the slave core is in an idle and allocable state, STransferring indicates that the message block in the slave core is in a transmission state, slaveReady indicates that the message block in the slave core is in a ready and usable state, and SlaveLocked indicates that the message block in the slave core is in a locking state;
the interface provided by the local memory-based message passing system between the master core and the slave core for the master core application program comprises:
m1, mAllocateMsg (), obtaining an address of a message block in a message queue main core part;
m2, mshendmsg (), starting the master core to transfer a message to the slave core;
m3, mRecvMsg (), receives a message sent from the core;
m4, mReleaseMsg (), releasing a message of a main core part;
the message queue system provides interfaces for the slave core application program including:
S1, sRecMsg (), and receiving a message sent by a main core;
s2, sReleaseMsg (), releasing a message from the core part;
s3, sAblocateMsg (), and obtaining an address of a message block from the core part in a message queue;
s4, sSendMsg (), and starting the slave core to transmit a message to the master core;
in the above interface, M1, M2, S1 and S2 are used for the master core to transfer messages to the slave core, and M3, M4, S3 and S4 are used for the slave core to transfer messages to the master core.
Further, the sequence of operations for sending a message from the master core to the slave core includes:
a1, calling mAllocateMsg (); based on a message transmission system between a main core and a slave core of a local memory, distributing an idle message block pointed by a position index IMIdle in a main core part of a message queue, setting the block to be in a MasterLocked state, circularly moving the IMIdle, and returning the block address MasterMsg to a main core application program;
a2, the master core application program sets a message to be sent in an idle message block pointed by the MasterMsg;
a3, the master core application program calls mSendMsg (), obtains a first message block MasterMsg pointed by the position index IMLocked, sets the message block MasterMsg to be in a MasterReady state based on a message transfer system between the master core and the slave core of the local memory, and circularly moves the IMLocked;
a4, the message transmission system between the main core and the slave core based on the local memory allocates an idle message block message storage space SlaveMsg pointed by a position index ISIdle for the message block to be transmitted in the slave core part at a set time, and circularly moves the ISIdle; starting DMA to transmit a message block in the MasterMsg to SlaveMsg, acquiring a first message block MasterMsg pointed by a position index IMReady, setting the message block MasterMsg to be in an MTransferring state, and setting the SlaveMsg to be in the STransferring state; after the DMA transmission is finished, the message block SlaveMsg pointed by the slave core message block position index ISTran is set to be in a SlaveReady state by the message transfer system between the master core and the slave core based on the local memory, and the message block MasterMsg pointed by the master core message block position index IMTran is set to be in a MasterIdle state;
a5, calling sRecvMsg (); the message queue returns a message block SlaveMsg which is pointed to by a position index ISReady of the slave core part to the slave core application program, and the message block SlaveMsg is set to SlaveLocked;
a6, reading the content in the SlaveMsg from the core application program;
a7, calling sReleaseMsg (); the message queue sets slave core message block SlaveMsg to a SlaveIdle state;
The sequence of operations for the slave core to send a message to the master core includes:
b1, calling sAblocateMsg (); based on the message transmission system between the main core and the auxiliary core of the local memory, distributing an idle message block pointed by a position index ISIdle in the auxiliary core part of the message queue, setting the block to be in a SlaveLocked state, circularly moving the ISIdle, and returning the block address SlaveMsg to the main core application program;
b2, setting a message to be sent in an idle message block pointed by SlaveMsg from the core application program;
b3, calling sSendMsg (); the method comprises the steps that a first message block SlaveMsg pointed by a position index ISLocked is obtained, and the message block SlaveMsg is set to be in a SlaveReady state based on a message transfer system between a master core and a slave core of a local memory; circularly moving the ISLocked;
b4, distributing an idle message block message storage space MasterMsg pointed by a position index IMIdle for a message block to be transmitted in a slave core part at a set time based on a message transfer system between a master core and a slave core of the local memory; starting DMA to transfer the message block in the Slave Msg to the MasterMsg, setting the message block MasterMsg to be in an MTransferring state, and setting the Slave Msg to be in an STransferring state; after the DMA transmission is finished, a message transfer system between a main core and a slave core based on a local memory sets a MasterMsg pointed by a main core message block position index IMTran to be in a MasterReady state, and sets a SlaveMsg pointed by a slave core message block position index ISTran to be in a SlaveIdle state;
b5, calling mRecvMsg (); the message queue returns a message block address MasterMsg pointed by a position index IMReady which is already positioned in the main core part to the main core application program;
b6, the master kernel application program reads the content in the MasterMsg;
b7, calling mReleaseMsg (); the message queue sets the master core message block MasterMsg to be in a MasterIdle state;
the application program of the master core or the slave core directly reads and writes the content of the message block in the memory area managed by the message queue, and the message content is not required to be moved to other memory spaces; therefore, the data moving expense of the message content can be reduced, and the use amount of the secondary core local memory can be effectively reduced;
the master core or the slave core application merely initiates the transmission of the message or receives the message without regard to the specific implementation of the transmission of the message between the master core and the slave core; the realization of message transmission is completed by a message transmission system between a master core and a slave core based on local memory; on the one hand, the application program design is simplified, and meanwhile, the application program has better portability.
Further, the blocking type message transmission process between the master core and the slave core is specifically as follows:
Message queues will maintain a set of DMA requests DMAReqs in each message queue; the set is initialized to an empty set;
the application calls the interface sRecvMsg (); in sRecvMsg (), the following steps are performed:
A1. judging whether a DMA request set DMAReqs of the message queue is empty, if so, executing the step A2, otherwise, executing the step A3;
A2. checking each request req in DMAREqs in sequence, checking whether the request req completes DMA, ignoring if not, setting a req.SMsg state as SlaveReady if complete, setting a req.MMsg state as MasterIdle, and removing req in the DMAREqs;
A3. judging whether the slave core part can acquire the message block SMsg in the SlaveIdle state, and if so, executing the step A4, otherwise, directly executing the step A5;
A4. setting the state of the message block corresponding to the MMsg as an MTransferring state, setting the state of the message block corresponding to the SMsg as an STransferring state, starting an asynchronous DMA request with the length of MsgSize bytes from the MMsg to the SMsg, adding req into DMAREqs, and executing the step A3 again;
A5. If the message in the core part message is in the SlaveReady state, setting the earliest SlaveReady state message Msg to be in a SlaveLosed state, returning the Msg to the end of the application program, otherwise, executing the step A1;
wherein the DMA request set DMAReqs is initialized to null;
the slave core application calls the interface sssendmsg (); in the sSendMsg (), the following steps are performed:
B1. judging whether a DMA request set DMAReqs of the message queue is empty, if so, executing the step B2, otherwise, executing the step B3;
B2. checking each request req in DMAREqs in sequence, checking whether the request req completes DMA, ignoring if not, setting req.SMsg state as SlaveIdle if complete, setting req.MMsg state as MasterReady, and removing req in DMAREqs;
B3. judging whether the slave core part can acquire the message block SMsg in the SlaveReady state, and if so, executing the step B4, otherwise, directly executing the step B5;
B4. setting the state of the message block corresponding to the MMsg as an MTransferring state, setting the state of the message block corresponding to the SMsg as an STransferring state, starting an asynchronous DMA request with the length of MsgSize bytes from the MMsg to the SMsg, adding req into DMAREqs, and executing the step B3 again;
B5. If the message sent from the core part at this time is in a SlaveLocked state, setting the Msg of the SlaveLocked state message as a SlaveReady state, returning the Msg to the end of the application program, otherwise, executing the step B1;
wherein the DMA request set DMAreqs is initialized to null.
Further, the memory space of the slave core accessing the master core has two different modes of direct access and asynchronous DMA transmission; the direct access mode has low efficiency and is suitable for small amount of data access; the asynchronous DMA transmission mode comprises two steps of starting a DMA transmission process and inquiring a DMA result; after starting DMA transmission, the software system finishes other works without waiting for the end of DMA, and knows whether the DMA is finished or not by inquiring the DMA result;
in the process of blocking the main core from sending/receiving the message, the message is returned only after the slave core receives the message of the main core, otherwise, the message is always waited for being sent by the main core;
when the slave core receives the message, the slave core starts a DMA transmission process of the message in the master ready state in the master core part; when the slave core has two or more message blocks and the speed at which the master core transmits messages is higher than the speed at which the slave core uses messages, it is possible to achieve that the slave core application program reads messages and the DMA transfer process is completed in parallel.
Further, the message queue is created by the master core, and a new queue handle is generated in both the master core and the slave core; in the aspect of a master core, handles are set according to different slave core number partitions, and in the aspect of the slave core, a unique queue entity can be determined by a handle number handle or a queue name qName; the handles of the same queue on the master core include handles on the slave cores, that is, the queue corresponding to the slave core (slave id) and the queue of which the handle corresponding to the slave core (slave id) is the same queue entity; the state of a specific message queue can be inquired through the identification number handle of the message queue, and the state mainly comprises the existence, the direction, the size and the number of the messages in the current queue;
the interface provided by the local memory-based message passing system between the master core and the slave core for the master core application program comprises:
m5, mQueryQueue (), inquiring whether a message queue exists;
m6, mQueueDirection (), obtaining the queue direction of the message queue;
m7, mQueueMsgNumInMaster (), obtaining the number of messages which can be accommodated by a control core part of a message queue;
m8, mQueueMsgNumInSlave (), obtaining the number of messages which can be accommodated by a computing core part of a message queue;
M9, mQueueMsgSize (), obtaining the maximum byte number of each message in the message queue;
m10, mQueueMsgSlaveMemTYpe (), acquiring a memory type of a slave core part in a message queue;
m11, mQueueMsgNumStatus (), obtaining the dynamic information of the message queue;
m12, mCreateQueue (), creating a message queue;
the interface provided by the local memory-based messaging system between the master and slave cores for the slave core application includes:
s5, sQueryQueue (), inquiring whether a message queue exists;
s6, sQueueDirection () is carried out to obtain the queue direction of the message queue;
s7, sQueueMsgNumInMaster (), obtaining the number of messages which can be accommodated by a control core part of a message queue;
s8, sQueueMsgNumInSlave (), acquiring the number of messages which can be accommodated by a computing core part of a message queue;
s9, obtaining the maximum byte number of each message in a message queue;
s10, sQueueMsgSlaveMemTYpe (), acquiring a memory type of a slave core part in a message queue;
s11, sQueueMsgNumStatus (), obtaining dynamic information of a message queue;
the interfaces M5-M12 are used for inquiring the related message queue information on the master core, and the interfaces S5-S11 are used for inquiring the related message queue information on the slave core.
Further, when a user creates a message queue, a dedicated message queue handle is generated, and the unique message queue can be obtained through a handle number or a queue name;
the corresponding state information of the message queue can be acquired in the aspects of the master core and the slave core; the master core may determine a unique queue entity because it is to communicate with multiple slave cores (slave core number, handle number) or (slave core number, queue name); in the slave core, the handle number handle or queue name qName may determine the unique queue entity.
Further, the following interfaces are arranged on different high-performance many-core processors, which cover the steps required by communication between a master core and a slave core, and comprise a slave core management mechanism on the master core, and the interfaces can be used for enabling codes to be quickly transplanted to various high-performance many-core processors while completing corresponding functions; when the code is transplanted to a new platform, the code is only required to be recompiled, and the compiling options of the corresponding platform are appointed during compiling;
the interface provided by the local memory-based message passing system between the master core and the slave core for the master core application program comprises:
m13, mHaltDevice, exit the run environment ();
M14, mHMess QueueInit, an initialization method ();
m15, mHMess QueueQuit, control the logging off method ();
m16, mLoadDatFile: load image file to device, only MT3 needs to use ();
m17, mUnloadDatFile, offload image files to device, only MT3 needs to use ();
m18, mGETSlaveCoreNum, namely acquiring the number of computing cores, namely controlling a cancellation method ();
m19, mGETMSize, memory size of acquisition control core and computation core, unit is byte
M20, mGETSlaveSIMDLanes, the number of channels for parallel processing of SIMD instructions to acquire a compute core
M21, mInitDevice, loading the running Environment of the acceleration device ()
M22, mTinitThreadID, acquire initialized thread data Structure
M23, mStartSlaveThread, create and initiate, bind thread group of computing core
M24, mWAITSlaveThreads waiting for thread group termination
M25, mdestroySlaveThreads: closing thread group
M26, mSlaveThreadActive, obtaining whether the thread of the computing core is active
The interface provided by the local memory-based messaging system between the master and slave cores for the slave core application includes:
s12, sHMessQueueInit (): initializing a message queue slave core;
S13, calculating a logging-off method () of the core part cache, and obtaining the queue direction of the message queue;
s14, sGetSlaveNum is used for acquiring a calculation core number (), and acquiring the number of messages which can be accommodated by a control core part of a message queue;
s15, sGetSlaveID, namely acquiring the number (), of the current computing core, and acquiring the number of the messages which can be accommodated by the computing core part of the message queue;
s16, obtaining the maximum byte number of each message in a message queue;
s17, sSIMDLanes, namely acquiring the number of channels (), processed by SIMD instructions of a computing core in parallel, and acquiring the memory type of a slave core part in a message queue;
M13-M20 is used for inquiring relevant information queue information on the main core, M21-M26 is used for managing the thread of the slave core on the main core, S12-S20 is used for inquiring relevant information queue information on the slave core;
the interface covers the functions required by the current different high-performance many-core processors, and the provided interface set I is the bottom layer interface L of the different high-performance many-core processors i 1.ltoreq.i, i= { L 1 ∪L 2 …; if there is no processor L with corresponding function 1 The up-call interface set I is a function I, i.e.,
Figure BDA0004066070880000201
the code will not have any negative impact;
By utilizing programming of macro definition, the predefined macros are set when different processors correspond to different predefined macros, the same effect is achieved by calling the same interface on different high-performance many-core processors, and the difference of the bottom libraries among different many-core processors is encapsulated.
In a specific embodiment, the message passing system between the master core and the slave core based on the local memory, the implementation flow of which is shown in fig. 6, comprises the following steps:
step 1, determining the operated platform t 1
Step 2, programming model will be in master core m i And corresponding to the slave core set S i Each slave core s i,j Initializing and starting the slave core thread management mechanism by an initialization mechanism respectively, wherein s is as follows i,j ∈S i A main core m i Slave core set S may be managed i ;;
Step 3, aiming at the platform t operated by the code 1 In the main core m i The message transfer interface according to the invention creates a corresponding set of message queues Q in memory, and uses the queue interface to create a connection between a master core and a slave core;
Step 4, the master core/slave core uses the message mechanism of the invention to transmit the message r to the message queue q i Among them, q i E Q, obtaining a message sequence R to orderly send a message R,
Step 5, the slave core/master core selects the corresponding message R from R according to the related information of the queue i ,r i E, R, i is more than or equal to 1 and less than or equal to |R| and the memory used by the message is released from the message queue; receiving the data from the core and performing corresponding processing;
step 6, slave core s i,j After processing data, logging out the buffer memory of the core, and the main core m i Reclaiming threads s i,j And checking whether tasks exist, if not, the main core cancels the cache, and the program multithreading is finished in parallel.
Step 7, transplanting the program to another platform t 2 And (4) recompilation is carried out, and compiling options of the corresponding platform are designated during compiling, so that codes can be operated.
In a specific embodiment, in a variety of high-performance many-core processors, namely, an architecture of a small number of main cores and a plurality of auxiliary cores, the auxiliary cores adopt a processor without a local memory with Cache consistency, and master-slave core communication programming is needed. The invention can effectively improve the portability of the application software and the research and development capability of the high-performance computing software.
In a specific embodiment, the password guessing procedure is performed by using a high-performance many-core processor, specifically as follows:
in this embodiment, there is a password guessing program for MD5, which needs to run on multiple many-core processors, and the ciphertext used in the experiment is: 25d55ad283aa400af464c76d713c07ad with corresponding password of 12345678, and different organization structures are adopted by the many-core processors, code reconstruction is needed to be respectively used for the respective processors twice without using the invention, and two queues, namely 'play' and 'result' can be respectively established in the main processor and the slave cores 1-N of the many-core processors based on the invention. Then, the multithreading model is used for communication, and the multithreading model can be directly transplanted and run on a plurality of many-core processors, and the formed handles are shown in the following table, as shown in table 1:
Figure BDA0004066070880000221
/>
TABLE 1
As shown in fig. 6, the three test methods are based on algorithm guesses, which are based on the present invention, and the trends of the three methods are basically consistent with those of the algorithm guesses without using the present invention, and the performance can be leveled with that of the code without using the present invention. The invention does not affect program performance on the basis of increased portability.

Claims (10)

1. A message transmission system between a master core and a slave core based on local memory is characterized by comprising a master core set M, which is respectively denoted as M 1 ,…,m |M| Where |M| represents the number of master cores in master core set M; main core m i Corresponding to one or more slave core sets S a And satisfy |S a |=|S b |,|S a The I represents the slave core set S a The number of secondary cores in the process is 1-a, and b-M;
a master core m via a slave core thread management interface i Slave core set S may be managed i ,1≤i≤|M|;
Wherein, in the creation of the ith main core m i To the ith slave core set S i The j-th slave core s in (a) i,j Is the kth message queue q i,j,k In which s is i,j ∈S i ,1≤j≤|S i I, 1 is less than or equal to k, and a calling interface can be utilized in a main core m i And slave core s i,j Creates a corresponding message queue q in the memory of the computer system i,j,k All master cores m i To slave core s i,j Is a message queue q of (2) i,j,k Constitutes the set Q, Q i,j,k E Q, complete master core m i And slave core s i,j The connection between the two;
main core m i Or from core s i,j A series of messages r are sent by a message sending mechanism x And transmitted to message queue q i,j,k Among them, a message sequence set R is obtained, and the messages in it are orderly sent outDelivering x and r are not less than 1 x ∈R;
Slave core s i,j Or main core m i According to message queue q i,j,k Selecting a corresponding message R from the sequence of messages R x Wherein r is x E, R,1 is less than or equal to x is less than or equal to |R|, and the user acquires the message R x Complete the self-defining message r x After processing of (a), message queue q i,j,k Release the message r x The memory used;
slave core s i,j Logging off slave cores s after processing data i,j Is cached in the master core m i Run thread reclamation slave core s i,j And continue processing master core m i If there is no task, the main core m i And logging off the cache, and ending the program in parallel by multiple threads.
2. The local memory-based inter-master and slave messaging system according to claim 1, wherein the creation of a message queue on the master requires specifying the following parameters:
the message queue name qName of the character string type, the slave core number slave ID connected, the message size msgSize, the message quantity mSize contained in the master core part of the message queue, the message quantity sSize contained in the slave core part, the starting address mQaddr of the master core message queue, the memory type sType occupied by the message queue in the slave core and the direction of the message queue; after the call is successful, a handle number handle is returned;
Wherein the master core identifies a queue entity with (slave core number, handle number) or (slave core number, queue name); the slave core takes the handle number or the queue name as the unique identification number of the queue to determine a unique queue entity; the handles of the same queue on the master core are the same as the handles on the slave cores;
the message queue is only used for communication between the master core and the slave core, and the user can specify the slave core slave ID where the queue is located; between a pair of the master core and the slave core, a plurality of cores may be provided; different message queues;
the size of each message in the message queue is not greater than msgSize bytes;
a message queue is distributed in a main core memory and a local memory of a slave core, and the number of the messages held by the main core memory and the local memory is mSize and sSize respectively;
the initial address of the message queue on the main core memory is a continuous memory space designated by an application program, and the initial address is mQaddr;
if the local memories on the slave cores are of different types, the type of local memory occupied by the message queue may be specified by the slave core memory type sType;
the message queue adopts one direction, and is divided into a main core writing/reading direction and a slave core writing/reading direction, and the directions are specified by the direction parameters;
The master core can create a plurality of message queues between the master core and one slave core, and the message queues between the master core and all the slave cores form a message queue set;
the master core completes the control of the slave core thread according to the slave core thread management interface, mainly creates and starts a slave core thread group for the interface, waits for the thread group to terminate, closes the thread group and loads an image file to the device by the master core.
3. The local memory-based message passing system between a master core and a slave core according to claim 2 wherein a message queue has a continuous block of memory space for storing message contents in both the master core portion and the slave core portion, the number of messages that can be accommodated by both being divided into mSize and sisize, the memory capacities occupied being mSize x msgSize bytes and sisize x msgSize bytes, respectively; the capacity of the message queue from the core portion is limited by the capacity of the local memory;
the control information layout of each message queue is divided into two parts: a status list and a location index;
the position index is divided into: IMTran, IMReady, IMLocked and IMIdle associated with the master core location, ISTran, ISReady, ISLocked and ISIdle associated with the slave core location; according to different message queue directions, different designs are also provided, and in the message queue control information layout of the master core to the slave core, IMLocked and IMIdle are stored in a master core address area; IMTran, IMReady and the remaining 4 location indices are located in the slave core local memory; while IMReady, IMLocked and IMIdle are stored in the master core address area in the message queue control information layout sent from the slave core to the master core; the IMTran and the other 4 position indexes are both located in the slave core local memory;
IMTran indicates that the first message block state in the main core space is the message position index in transmission; IMReady represents the message location index in the main core space where the first message block state is ready for a message; IMLocked represents the first message block state in the main core space as the message position index in the message lock; IMIdle indicates that the first message block state in the main core space is the message position index in the message idle;
ISTran indicates that the first message block state in kernel space is the message location index in transmission; ISReady represents the message location index that is ready for a message from the first message block state in the kernel space; ISLocked denotes the message location index from the first message block state in core space in message lock; ISIdle indicates that the first message block state in the kernel space is the message position index in the message idle;
each state in the state list corresponds to each message block in the annular message block data area one by one; the message block state list of the master core part and the message block state list of the slave core part are respectively marked as MState and SState and are respectively positioned in a master core address area and a slave core local memory;
a message queue is divided into a master core part and a slave core part;
At the time of message queue creation, the number of messages that the master core portion and the slave core portion can accommodate has been determined.
4. A local memory based master-slave messaging system according to claim 3 wherein in the master-to-slave message queue, the state of a message block in the master portion comprises: masterIdle, masterLocked, masterReady, MTransferring; a message block includes SlaveIdle, STransferring, slaveReady, slaveLocked in the state of the slave core portion; the status information of each message block is stored in a respective memory;
after the message queue is created, all message blocks of the master core part are in a MasterIdle state, and all message blocks of the slave core part are in a SlaveIdle state;
MasterIdle indicates that the message block in the main core is in an idle and allocable state, masterLocked indicates that the message block in the main core is in a locking state, masterReady indicates that the message block in the main core is in a ready and available state, and MTransferring indicates that the message block in the main core is in a transmission state;
SlaveIdle indicates that the message block in the slave core is in an idle and allocable state, STransferring indicates that the message block in the slave core is in a transmission state, slaveReady indicates that the message block in the slave core is in a ready and usable state, and SlaveLocked indicates that the message block in the slave core is in a locking state;
The interface provided by the local memory-based message passing system between the master core and the slave core for the master core application program comprises:
m1, mAllocateMsg (), obtaining an address of a message block in a message queue main core part;
m2, mshendmsg (), starting the master core to transfer a message to the slave core;
m3, mRecvMsg (), receives a message sent from the core;
m4, mReleaseMsg (), releasing a message of a main core part;
the message queue system provides interfaces for the slave core application program including:
s1, sRecMsg (), and receiving a message sent by a main core;
s2, sReleaseMsg (), releasing a message from the core part;
s3, sAblocateMsg (), and obtaining an address of a message block from the core part in a message queue;
s4, sSendMsg (), and starting the slave core to transmit a message to the master core;
in the above interface, M1, M2, S1 and S2 are used for the master core to transfer messages to the slave core, and M3, M4, S3 and S4 are used for the slave core to transfer messages to the master core.
5. The local memory-based inter-master and slave messaging system according to claim 4, wherein the sequence of operations for transmitting a message from the master to the slave comprises:
a1, calling mAllocateMsg (); based on a message transmission system between a main core and a slave core of a local memory, distributing an idle message block pointed by a position index IMIdle in a main core part of a message queue, setting the block to be in a MasterLocked state, circularly moving the IMIdle, and returning the block address MasterMsg to a main core application program;
a2, the master core application program sets a message to be sent in an idle message block pointed by the MasterMsg;
a3, the master core application program calls mSendMsg (), obtains a first message block MasterMsg pointed by the position index IMLocked, sets the message block MasterMsg to be in a MasterReady state based on a message transfer system between the master core and the slave core of the local memory, and circularly moves the IMLocked;
a4, the message transmission system between the main core and the slave core based on the local memory allocates an idle message block message storage space SlaveMsg pointed by a position index ISIdle for the message block to be transmitted in the slave core part at a set time, and circularly moves the ISIdle; starting DMA to transmit a message block in the MasterMsg to SlaveMsg, acquiring a first message block MasterMsg pointed by a position index IMReady, setting the message block MasterMsg to be in an MTransferring state, and setting the SlaveMsg to be in the STransferring state; after the DMA transmission is finished, the message block SlaveMsg pointed by the slave core message block position index ISTran is set to be in a SlaveReady state by the message transfer system between the master core and the slave core based on the local memory, and the message block MasterMsg pointed by the master core message block position index IMTran is set to be in a MasterIdle state;
a5, calling sRecvMsg (); the message queue returns a message block SlaveMsg which is pointed to by a position index ISReady of the slave core part to the slave core application program, and the message block SlaveMsg is set to SlaveLocked;
a6, reading the content in the SlaveMsg from the core application program;
a7, calling sReleaseMsg (); the message queue sets slave core message block SlaveMsg to a SlaveIdle state;
the sequence of operations for the slave core to send a message to the master core includes:
b1, calling sAblocateMsg (); based on the message transmission system between the main core and the auxiliary core of the local memory, distributing an idle message block pointed by a position index ISIdle in the auxiliary core part of the message queue, setting the block to be in a SlaveLocked state, circularly moving the ISIdle, and returning the block address SlaveMsg to the main core application program;
b2, setting a message to be sent in an idle message block pointed by SlaveMsg from the core application program;
b3, calling sSendMsg (); the method comprises the steps that a first message block SlaveMsg pointed by a position index ISLocked is obtained, and the message block SlaveMsg is set to be in a SlaveReady state based on a message transfer system between a master core and a slave core of a local memory; circularly moving the ISLocked;
b4, distributing an idle message block message storage space MasterMsg pointed by a position index IMIdle for a message block to be transmitted in a slave core part at a set time based on a message transfer system between a master core and a slave core of the local memory; starting DMA to transfer the message block in the Slave Msg to the MasterMsg, setting the message block MasterMsg to be in an MTransferring state, and setting the Slave Msg to be in an STransferring state; after the DMA transmission is finished, a message transfer system between a main core and a slave core based on a local memory sets a MasterMsg pointed by a main core message block position index IMTran to be in a MasterReady state, and sets a SlaveMsg pointed by a slave core message block position index ISTran to be in a SlaveIdle state;
b5, calling mRecvMsg (); the message queue returns a message block address MasterMsg pointed by a position index IMReady which is already positioned in the main core part to the main core application program;
b6, the master kernel application program reads the content in the MasterMsg;
b7, calling mReleaseMsg (); the message queue sets the master core message block MasterMsg to the MasterIdle state.
6. The local memory-based message passing system between a master core and a slave core according to claim 4, wherein the blocking message transmission procedure between the master core and the slave core is specifically as follows:
Message queues will maintain a set of DMA requests DMAReqs in each message queue; the set is initialized to an empty set;
the application calls the interface sRecvMsg (); in sRecvMsg (), the following steps are performed:
A1. judging whether a DMA request set DMAReqs of the message queue is empty, if so, executing the step A2, otherwise, executing the step A3;
A2. checking each request req in DMAREqs in sequence, checking whether the request req completes DMA, ignoring if not, setting a req.SMsg state as SlaveReady if complete, setting a req.MMsg state as MasterIdle, and removing req in the DMAREqs;
A3. judging whether the slave core part can acquire the message block SMsg in the SlaveIdle state, and if so, executing the step A4, otherwise, directly executing the step A5;
A4. setting the state of the message block corresponding to the MMsg as an MTransferring state, setting the state of the message block corresponding to the SMsg as an STransferring state, starting an asynchronous DMA request with the length of MsgSize bytes from the MMsg to the SMsg, adding req into DMAREqs, and executing the step A3 again;
A5. If the message in the core part message is in the SlaveReady state, setting the earliest SlaveReady state message Msg to be in a SlaveLosed state, returning the Msg to the end of the application program, otherwise, executing the step A1;
wherein the DMA request set DMAReqs is initialized to null;
the slave core application calls the interface sssendmsg (); in the sSendMsg (), the following steps are performed:
B1. judging whether a DMA request set DMAReqs of the message queue is empty, if so, executing the step B2, otherwise, executing the step B3;
B2. checking each request req in DMAREqs in sequence, checking whether the request req completes DMA, ignoring if not, setting req.SMsg state as SlaveIdle if complete, setting req.MMsg state as MasterReady, and removing req in DMAREqs;
B3. judging whether the slave core part can acquire the message block SMsg in the SlaveReady state, and if so, executing the step B4, otherwise, directly executing the step B5;
B4. setting the state of the message block corresponding to the MMsg as an MTransferring state, setting the state of the message block corresponding to the SMsg as an STransferring state, starting an asynchronous DMA request with the length of MsgSize bytes from the MMsg to the SMsg, adding req into DMAREqs, and executing the step B3 again;
B5. If the message sent from the core part at this time is in a SlaveLocked state, setting the Msg of the SlaveLocked state message as a SlaveReady state, returning the Msg to the end of the application program, otherwise, executing the step B1;
wherein the DMA request set DMAreqs is initialized to null.
7. The local memory-based inter-master and slave core messaging system according to claim 6, wherein the slave core accesses the master core's memory space in two different ways, direct access and asynchronous DMA transfer; the asynchronous DMA transmission mode comprises two steps of starting a DMA transmission process and inquiring a DMA result; after starting DMA transmission, the software system finishes other works without waiting for the end of DMA, and knows whether the DMA is finished or not by inquiring the DMA result;
in the process of blocking the main core from sending/receiving the message, the message is returned only after the slave core receives the message of the main core, otherwise, the message is always waited for being sent by the main core;
when the slave core receives the message, the slave core starts a DMA transmission process of the message in the master ready state in the master core part; when the slave core has two or more message blocks and the speed of the master core sending the message is higher than the speed of the slave core using the message, the reading of the message from the slave core application program and the DMA transfer process are completed in parallel.
8. The local memory-based message passing system between a master core and a slave core of claim 1, wherein the message queue is created by the master core, and a new queue handle is generated in both the master core and the slave core; in the aspect of a master core, handles are set according to different slave core number partitions, and in the aspect of the slave core, a unique queue entity can be determined by a handle number handle or a queue name qName; the handles of the same queue on the master core include handles on the slave cores, that is, the queue corresponding to the slave core (slave id) and the queue of which the handle corresponding to the slave core (slave id) is the same queue entity; the state of a specific message queue can be inquired through the identification number handle of the message queue, and the state mainly comprises the existence, the direction, the size and the number of the messages in the current queue;
the interface provided by the local memory-based message passing system between the master core and the slave core for the master core application program comprises:
m5, mQueryQueue (), inquiring whether a message queue exists;
m6, mQueueDirection (), obtaining the queue direction of the message queue;
m7, mQueueMsgNumInMaster (), obtaining the number of messages which can be accommodated by a control core part of a message queue;
M8, mQueueMsgNumInSlave (), obtaining the number of messages which can be accommodated by a computing core part of a message queue;
m9, mQueueMsgSize (), obtaining the maximum byte number of each message in the message queue;
m10, mQueueMsgSlaveMemTYpe (), acquiring a memory type of a slave core part in a message queue;
m11, mQueueMsgNumStatus (), obtaining the dynamic information of the message queue;
m12, mCreateQueue (), creating a message queue;
the interface provided by the local memory-based messaging system between the master and slave cores for the slave core application includes:
s5, sQueryQueue (), inquiring whether a message queue exists;
s6, sQueueDirection () is carried out to obtain the queue direction of the message queue;
s7, sQueueMsgNumInMaster (), obtaining the number of messages which can be accommodated by a control core part of a message queue;
s8, sQueueMsgNumInSlave (), acquiring the number of messages which can be accommodated by a computing core part of a message queue;
s9, obtaining the maximum byte number of each message in a message queue;
s10, sQueueMsgSlaveMemTYpe (), acquiring a memory type of a slave core part in a message queue;
s11, sQueueMsgNumStatus (), obtaining dynamic information of a message queue;
The interfaces M5-M12 are used for inquiring the related message queue information on the master core, and the interfaces S5-S11 are used for inquiring the related message queue information on the slave core.
9. The local memory-based message passing system between a master core and a slave core of claim 1 wherein a unique message queue handle is generated when a user creates a message queue, the unique message queue being obtained by a handle number or a queue name;
the corresponding state information of the message queue can be acquired in the aspects of the master core and the slave core; the master core may determine a unique queue entity because it is to communicate with multiple slave cores (slave core number, handle number) or (slave core number, queue name); in the slave core, the handle number handle or queue name qName may determine the unique queue entity.
10. The local memory-based message passing system between a master core and a slave core according to claim 1, wherein interfaces are provided for different high-performance many-core processors, covering the steps required for communication between the master core and the slave core, including a slave core management mechanism on the master core, with which the code can be quickly portable to a plurality of high-performance many-core processors while performing the corresponding functions; when the code is transplanted to a new platform, the code is only required to be recompiled, and the compiling options of the corresponding platform are appointed during compiling;
The interface provided by the local memory-based message passing system between the master core and the slave core for the master core application program comprises:
m13, mHaltDevice, exit the run environment ();
m14, mHMess QueueInit, an initialization method ();
m15, mHMess QueueQuit, control the logging off method ();
m16, mLoadDatFile: load image file to device, only MT3 needs to use ();
m17, mUnloadDatFile, offload image files to device, only MT3 needs to use ();
m18, mGETSlaveCoreNum, namely acquiring the number of computing cores, namely controlling a cancellation method ();
m19, mGETMSize, memory size of acquisition control core and computation core, unit is byte
M20, mGETSlaveSIMDLanes, the number of channels for parallel processing of SIMD instructions to acquire a compute core
M21, mInitDevice, loading the running Environment of the acceleration device ()
M22, mTinitThreadID, acquire initialized thread data Structure
M23, mStartSlaveThread, create and initiate, bind thread group of computing core
M24, mWAITSlaveThreads waiting for thread group termination
M25, mdestroySlaveThreads: closing thread group
M26, mSlaveThreadActive, obtaining whether the thread of the computing core is active
The interface provided by the local memory-based messaging system between the master and slave cores for the slave core application includes:
s12, sHMessQueueInit (): initializing a message queue slave core;
s13, calculating a logging-off method () of the core part cache, and obtaining the queue direction of the message queue;
s14, sGetSlaveNum is used for acquiring a calculation core number (), and acquiring the number of messages which can be accommodated by a control core part of a message queue;
s15, sGetSlaveID, namely acquiring the number (), of the current computing core, and acquiring the number of the messages which can be accommodated by the computing core part of the message queue;
s16, obtaining the maximum byte number of each message in a message queue;
s17, sSIMDLanes, namely acquiring the number of channels (), processed by SIMD instructions of a computing core in parallel, and acquiring the memory type of a slave core part in a message queue;
M13-M20 are used for inquiring relevant information queue information on the master core, M21-M26 are used for managing threads of the slave core on the master core, and S12-S20 are used for inquiring relevant information queue information on the slave core.
CN202310075604.1A 2023-02-01 2023-02-01 Message transmission system between master core and slave core based on local memory Pending CN116302592A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310075604.1A CN116302592A (en) 2023-02-01 2023-02-01 Message transmission system between master core and slave core based on local memory

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310075604.1A CN116302592A (en) 2023-02-01 2023-02-01 Message transmission system between master core and slave core based on local memory

Publications (1)

Publication Number Publication Date
CN116302592A true CN116302592A (en) 2023-06-23

Family

ID=86836873

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310075604.1A Pending CN116302592A (en) 2023-02-01 2023-02-01 Message transmission system between master core and slave core based on local memory

Country Status (1)

Country Link
CN (1) CN116302592A (en)

Similar Documents

Publication Publication Date Title
Chen et al. GFlink: An in-memory computing architecture on heterogeneous CPU-GPU clusters for big data
KR102600852B1 (en) Accelerate data flow signal processing applications on heterogeneous CPU/GPU systems
Augonnet et al. Data-aware task scheduling on multi-accelerator based platforms
Grasso et al. LibWater: heterogeneous distributed computing made easy
Kishimoto et al. Scalable, parallel best-first search for optimal sequential planning
Kamal et al. FG-MPI: Fine-grain MPI for multicore and clusters
Potluri et al. Extending openSHMEM for GPU computing
Gracioli et al. An experimental evaluation of the cache partitioning impact on multicore real-time schedulers
Perarnau et al. Argo NodeOS: Toward unified resource management for exascale
Aguilar Mena et al. OmpSs-2@ Cluster: Distributed memory execution of nested OpenMP-style tasks
US8387009B2 (en) Pointer renaming in workqueuing execution model
Pöppl et al. A UPC++ actor library and its evaluation on a shallow water proxy application
Odajima et al. GPU/CPU work sharing with parallel language XcalableMP-dev for parallelized accelerated computing
CN116302592A (en) Message transmission system between master core and slave core based on local memory
CN116775265A (en) Collaborative group array
CN116774914A (en) Distributed shared memory
Michael et al. Data-flow concurrency on distributed multi-core systems
Bouhrour et al. Towards leveraging collective performance with the support of MPI 4.0 features in MPC
Lu et al. Enabling low-overhead communication in multi-threaded openshmem applications using contexts
Sato et al. EM-C: Programming with explicit parallelism and locality for EM-4 multiprocessor
Mößbauer et al. A portable multidimensional coarray for C++
Cole et al. Efficient resource oblivious algorithms for multicores
Dang et al. Eliminating contention bottlenecks in multithreaded MPI
Thomadakis et al. Runtime support for performance portability on heterogeneous distributed platforms
Kamil et al. Optimization of asynchronous communication operations through eager notifications

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination