CN102331923B - Multi-core and multi-threading processor-based functional macropipeline implementing method - Google Patents

Multi-core and multi-threading processor-based functional macropipeline implementing method Download PDF

Info

Publication number
CN102331923B
CN102331923B CN201110309287.2A CN201110309287A CN102331923B CN 102331923 B CN102331923 B CN 102331923B CN 201110309287 A CN201110309287 A CN 201110309287A CN 102331923 B CN102331923 B CN 102331923B
Authority
CN
China
Prior art keywords
thread
queue
race
bag
jump
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201110309287.2A
Other languages
Chinese (zh)
Other versions
CN102331923A (en
Inventor
李康
赵庆贺
雷理
范勇
马佩军
史江义
郝跃
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN201110309287.2A priority Critical patent/CN102331923B/en
Publication of CN102331923A publication Critical patent/CN102331923A/en
Application granted granted Critical
Publication of CN102331923B publication Critical patent/CN102331923B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Multi Processors (AREA)

Abstract

The invention discloses a multi-core and multi-threading processor-based functional macropipeline implementing method. A plurality of processors are divided into different clusters, namely a receiving cluster and a transmitting cluster; the plurality of processors have parallel structures in the receiving cluster and the transmitting cluster; the receiving cluster is responsible for receiving messages, a parallel structure is adopted in the receiving cluster, and all packet receiving tasks are completed parallelly by a plurality of threadings; and the transmitting cluster is responsible for transmitting the messages, including checking whether a new data packet transmitting task is present, acquiring queue descriptor information of the current head pointer after reading a new transmitting task, transmitting a data packet from a synchronous dynamic random access memory (SDRAM) unit specified by a descriptor to a specified transmitting buffer unit and maintaining synchronous communication between the head pointer of a queue and the receiving cluster, a parallel structure is adopted in the transmitting cluster, and all packet transmitting tasks are completed parallelly by a plurality of threadings.

Description

A kind of functional macro streamline implementation method based on Multi-core processor
Technical field
The present invention relates to a kind of functional macro streamline implementation method based on Multi-core processor.
Background technology
Along with the network bandwidth increases rapidly, make various network entity as the programmability of router, switch and gateway etc. requires constantly to increase with multi-functional, need the more perfect application program of exploitation to play the performance advantage of polycaryon processor better, meet various high-throughput, the demand of multimedia network communication of low delay just become the current subject matter faced.But under the parallel processing structure of current various multi-core network processor, because the dirigibility of multiple programming and high efficiency are not well represented, parallel network processing power is restricted.Network parallel processing power needs the executed in parallel efficiency taking into account hardware parallel organization and software, comparatively speaking has it apply singularity and complicacy with traditional list process structure.Software developer must understand various hardware resource in depth, comprises processor array, hardware thread, storer and mutual communication mechanism thereof, is directly managed in system design process by software to it.To the programming of parallel network processor need application developer understand the operating mechanism of thread, the distribution of thread, thread synchronous, the management of thread, scheduling, the load balancing of thread and how particular problem is mapped to the problems such as multiple thread parallel execution, the difficulty of exploitation and debugging is all larger, and these things are responsible for by operating system traditionally.Too much hardware details becomes the yoke of application and development, cannot give play to the powerful parallel performance advantage of Multi-core processor.In addition, existing various network processor architecture, development language are all not quite similar, do not hide bottom hardware difference and provide unitized overall development interface, the application therefore for the exploitation of particular network processor hardware can not be transplanted on another kind of Network Processor hardware platform very soon.
The domestic and international research for the current application and development aspect based on multi-core network processor is existing to be carried out, and due to network processing unit NP structural particularity, existing method does not possess unified high-rise programming model usually, perfect not to the utmost yet; And, seldom there is the method relating to and adapt to different N P framework, cause application program can not transplant between each NP platform.This greatly limits application and the exploitation of Network Processor technology, also counteracts that its potential advantages given play to by Multi-core network processing unit.
Therefore, when multi-core network processor application and development, need the hardware architecture difference that a kind of effective method for parallel processing or technology are come between hiding heterogeneous networks processor, fully excavate out the parallel performance advantage of Multi-core framework; Simultaneously in the Network application and development based on polycaryon processor structure, the realization of message task matching and inter-thread communication must be controlled by developer, ensure load balancing and the packet order preserving of each processing unit, the distribution of message task needs balance load balancing and packet order preserving; Network processing unit comprises the shared resource of number of different types, and such as internal memory, transmission buffer zone etc., therefore must arbitrate and mutually exclusive operation the use of various shared resource.This effective method or technology must can solve the two problems of packet transaction in network processing unit, and one is the Preserving problems of bag order, and another is the Line Procedure Mutually-exclusive problem of multi-threaded parallel when accessing a data structure.
Summary of the invention
In order to overcome the defect that above-mentioned network processing unit software development exists, solve the key issue of network message process, give full play to the parallel processing performance advantage of Multi-core hardware configuration, the invention provides a kind of based on Multi-core processors sharing memory hardware structural functional macro streamline implementation method.The functional macro pipeline organization that the present invention proposes is the Streamline technique adopted between the processors, and this flowing structure is connected according to the difference in functionality of packet handler in bag process and is connected into streamline.This technology ensures load balancing, the parallel processing of multi-microprocessor by the task matching dispatching method that a kind of parallel message based on Multi-core receives, processes and sends, and effectively can maintain the high-throughput of multi-core network processor system; Secondly, utilize a kind of synchronized communication method between hardware thread effectively to ensure that status information transmission and the packet order preserving of reception and transmission processing, solve the synchronous communication problem of each cross-thread; Again, devise the method for the shared resource of multicomputer system being carried out to synchronization of access, guarantee synchronous to during the access of same data structure of multiprocessor thread by the mutual exclusion mechanism of particular design.In other side, hardware architecture difference between heterogeneous networks processor can effectively be hidden by the model that the present invention proposes, and provides a kind of unified development model, provides easy care, easily expansion, efficiently, transplantable network processing unit application and development interface.
This method is based on a kind of Multi-core network processor arrangement, and the method that this structure adopts multiple processor unit to share hierarchy of memory realizes.Microprocessor can support hardware multi-threading parallel process, and the instruction that in different cycles, microprocessor performs may from different threads.A thread by instruction initiatively thread switching, can be transferred to next hardware thread to continue to perform, conceals slow devices access latency.Accumulator system have employed the hierarchical structure be jointly made up of distributed memory and shared storage device according to different data types, improve concurrency and the throughput performance of memory access simultaneously.On the parallel processor basis of shared storage hardware system hierarchical structure, the method combined by soft or hard realizes a kind of functional macro pipeline organization, improves the concurrency of bag processing procedure and overall throughput.
The treatment step that bag processing capacity macropipeline division methods is wrapped mainly for IP, divides bag processing capacity, and each several part subfunction is mapped on different processors.Multiple processor is divided into different races: receive race, sends race.In race, inner multiple processor adopts parallel organization, and not of the same clan between connect according to functional sequence, complete a series flow water-bound, each race is responsible for wrapping work for the treatment of accordingly separately.The aspect that the realization of functional macro flowing structure must think over comprises: the Parallel Scheduling how carrying out bag Processing tasks; Communication between the hardware thread how realizing fast and reliable is with synchronous; The transmission race how carrying out function flowing water and the data communication received between race and synchronous.Following several technology is adopted to realize for above aspect the present invention.
A kind of bag process Parallel Task Scheduling method, forms thread pool by multiple threads of the multiple processors completing identical function in a race, utilizes overall scheduling method to distribute thread task unitedly.All idle threads put into a thread pool, wherein two processor affinities take different idle thread scheduling modes respectively, receiving race adopts robin scheduling mode to distribute new bag reception Processing tasks to the idle thread in thread pool, and send race and adopt independently to arbitrate a thread (this thread does not enter thread pool), this thread constantly by new transmission task matching to the idle thread in thread pool.
The method of a kind of upper hardware thread intercommunication, for each thread provides hardware (as RAM on sheet) storage unit on an exclusive sheet, the cross-thread of process consolidated network grouping uses this unit to carry out communicating, communicating state information, ensures the status information continuity of each thread of the same grouping of process.Thread in same processor can use on-chip memory communication, obtain higher access speed, and the thread communication between different processor also can use shared chip external memory to communicate.A kind of atomic operation mechanism is provided simultaneously, solves the stationary problem that different threads accesses same shared memory cell.
A kind of based on many producers---the communication strategy of the series flow water-bound of many Consumer model, the synchronous communication operation receiving race and send between race is completed by adopting a kind of hardware shared queue technology, receive each thread of race as the producer, safeguard the tail pointer of queue, and each thread sending race safeguards the head pointer of queue as consumer.Meanwhile, this structure additionally provides a kind of mutual exclusion mechanism based on hardware read operation locking, between each thread solving same gang and thread not of the same clan the same unit of shared queue to be conducted interviews the stationary problem of operation.
Embodiments of the invention have following beneficial effect:
(1) the functional macro flowing structure adopted in the present invention configures in also mixed structure mode of going here and there, and solves shortcoming and the bottleneck of traditional serial and parallel organization, more easily ensures load balancing and the Parallel Scheduling of each processor.
(2) the present invention adopts parallel message method for scheduling task, effectively can improve the load balancing of cross-thread, and each thread parallel effective operation ground concealing memory access latency, improves system bag processing speed, system throughput is significantly improved.
(3) synchronous method of the intercommunication of a kind of hardware thread of the present invention's employing, the thread communication of the hierarchy of memory shared based on multiprocessor can solve the synchronous communication problem of each cross-thread of concurrent working effectively, ensure the effective transmission of Packet State information between different threads, ensure that the information continuity received in transmission processing.
(4) many producers---the communication strategy of the series flow water-bound of many Consumer model of the present invention's employing, effectively can solve the data synchronization problems between receiving and sending, effectively ensure packet order preserving; The mutual exclusion mechanism based on read operation locking simultaneously provided effectively can solve shared queue's unit problem of mutual exclusion.
In a word, pass through such scheme, this programming model not only solves the thread parallel scheduling in the application and development of heterogeneous networks processor, and the problem of message task matching, ensures the load balancing of each thread, efficiently solve shared cell mutual exclusion, ensure packet order preserving, thread synchronization, and shield hardware details in heterogeneous networks processor, a kind of unified hardware abstraction is provided, there is the advantage of easy exploiting, easy care, easily expansion, easily transplanting.
Accompanying drawing explanation
Be described in detail embodiments of the invention in conjunction with the drawings, the object of above and other of the present invention, feature, advantage will become apparent, wherein:
Fig. 1 is a kind of Multi-core parallel processor architecture figure;
Fig. 2 is the macropipeline structural drawing divided based on the bag processing capacity of IP message;
Fig. 3 is that the message based on thread pool model improved receives Task Assigned Policy figure;
Fig. 4 is that the message based on thread pool model improved sends Task Assigned Policy figure;
Fig. 5 is the ethernet data frame segmentation figure of the embodiment of the present invention;
Fig. 6 a is the process flow diagram of a kind of thread communication based on shared on-chip memory of the embodiment of the present invention;
Fig. 6 b is a shared on-chip memory cell structural drawing of the embodiment of the present invention;
Fig. 7 a is a kind of based on many producers---shared queue's structural drawing of the serial flowing water of many Consumer model;
Fig. 7 b is queue descriptor's structural drawing of the embodiment of the present invention;
Fig. 8 is a kind of synchronization mechanism realization flow figure based on read operation mutex lock of the embodiment of the present invention;
Embodiment
The parallel programming model based on multi-core network processor according to the embodiment of the present invention is described below with reference to accompanying drawings.In the accompanying drawings, identical reference number represents identical element from start to finish.Be to be understood that: the embodiments described herein is only illustrative, and should not be interpreted as limiting the scope of the invention.
Be a kind of Multi-core network processor architecture 100 schematic diagram as shown in Figure 1, comprise general processor 102, bag process microprocessor 104, hierarchy of memory 106, coprocessor system 108, data exchange system 110.
The function that general processor 102 mainly completes comprises system start-up initialisation, and code loads, the process of control and management function and some upper-layer protocols and abnormal packet transaction.Function in control and management aspect is given general purpose microprocessor and has been gone, and the function in data retransmission aspect is given microprocessor 104 and gone.Inside is not had to the network processing unit of embedded general purpose microprocessor, these functions are completed by external piloting control, or have been come by special coprocessor.
Microprocessor 104 is core cells of network processing unit, and it is actually the special microprocessor core of a function, mainly completes the work of message repeating in data retransmission aspect.Fine granularity multithreading all supported by microprocessor, the concurrency of the characteristic network enabled Message processing of fine granularity multithreading, by the handover overhead between hardware guarantee thread close to 0.When a thread waits operates as memory read/write at a slow speed, can switch away and start to perform another thread, conceal slow devices access latency, thread is enable to perform the instruction of a thread at full speed continuously, until accessing operation causes thread to switch, processor runs continuously, avoids memory access and waits for, thus overall system throughput is got a promotion.
Hierarchy of memory 106, comprise distributed memory as outer in sheet in microprocessor registers heap, on-chip memory and shared storage SRAM and SDRAM and the high-speed cache Cache etc. of network processing unit inside. according to the purposes of different pieces of information, different requirements is had to the delay of storer and bandwidth, therefore hierarchical storage mechanism can be adopted, the communication of cross-thread can be carried out based on the on-chip memory shared, ensure the data syn-chronization in message place process, status information transmission.
Association's disposal system 108 is a kind of hardware acceleration unit, generally completes in Message processing that to perform frequency higher, process more complicated function as CRC check and calculating and routing table look-up etc.
Data exchange system 110 and external unit 112, as MAC chip is connected, mainly complete the exchanges data of network layer and data link layer.Be responsible for receiving from the datagram bit stream of Physical layer, form message and be also stored in message buffering, it can under the scheduling of network processing unit under complete data reception and buffering, send and the function such as buffering.
The Multi-core network processor architecture that the present invention proposes is a kind of general multi-core network processor hardware architecture, the functional macro streamline implementation method proposed for the present invention provides a kind of unified hardware platform, shield hardware details in heterogeneous networks processor, a kind of unified hardware abstraction is provided, is easy to expansion and transplants.
The division methods 200 of the macropipeline based on bag processing capacity of the invention process as shown in Figure 2, come with the macropipeline structure that transmission race 204 is formed by receiving race 202 by parallel bag processing capacity.According to the function of different microprocessor in Message processing process, the present invention adopts microprocessor string and mixed structure mode, bag process microprocessor 104 is divided into and receives race 202, send race 204, series flow water-bound is adopted between two races, each race realizes different functions respectively, and be connected into a streamline by interconnection mechanism series connection between race, namely grouping successively by completing whole Processing tasks after whole flowing structure.
Receive the reception process that race 202 is responsible for message, comprise bag and receive reception buffering FIFO, bag classification, second layer data link layer checks is as mac address filter, frame type verification etc., the verification of third layer network layer is as IP head verification (agreement, School Affairs, TTL), route querying, packet header amendment (amendment TTL, School Affairs), bag is cached to the outer SDRAM of sheet etc., need after processing grouping to write new queue descriptor to current queue tail pointer address place simultaneously, safeguard that rear of queue pointer carries out synchronous communication with transmission race 204, receive race inside and adopt parallel organization, all bags that completes of multiple thread parallel receive Processing tasks.
Send the transmission processing that race 204 is responsible for message, comprise and check whether new Packet Generation task, after reading new transmission task, obtain queue descriptor's information of current head pointer, packet is sent to the transmission buffer cell of specifying from the SDRAM unit that descriptor is specified, safeguard queue head pointer with receive race 202 carry out synchronous communication, send race inside adopt parallel organization, multiple thread parallel complete all bag transmission processing tasks.
The bag processing capacity division methods based on macropipeline that the present invention proposes solves shortcoming and the bottleneck of traditional serial and parallel organization, extensibility is good, make full use of bus and processor resource, more easily ensure load balancing and the Parallel Scheduling of each processor, thus the system that ensures has higher throughput.Meanwhile, the realization of the functional macro flowing water technology of the present invention's proposition must rely on the key issue that following three kinds of technical methods solve bag process.
Parallel bag Processing tasks dispatching method of the invention process as shown in Figure 3, Figure 4, receive race 202, the distribution of the message task of transmission Zu204Liang Ge race is all several the micro-bags based on packet being divided into 64 bytes, a Frame 502 as shown in Figure 5, be divided into three micro-bags, different micro-bag Processing tasks is distributed to different threads, each thread parallel carry out bag Processing tasks, cross-thread needs to carry out synchronous communication, transmit bag process state information, to ensure that same grouping receives the continuity of process, buffer memory, transmission.
The message receiving race 202 receives Task Assigned Policy as shown in Figure 3, traditional thread pool model is adopted to carry out the distribution of micro-bag Processing tasks, micro-bag of same grouping receives task matching and processes to different threads, after thread completes reception task, put into and receive race's thread pool 302, wait for new bag Processing tasks, adopt robin scheduling mode to idle thread allocating task, current thread acquisition IP divides into groups after 304 1 micro-bag reception tasks, a semaphore can be sent to next thread, inform next thread allocating task.The reception task matching of the Frame grouping 502 shown in Fig. 5 is as follows, after thread 0 obtains the Processing tasks of first micro-bag 504 of this grouping, send the Processing tasks that semaphore notice thread 1 distributes micro-bag 506, after thread 1 acquisition task, can notify again that thread 2 processes last micro-bag 508.In order to communicating state information between the different receiving threads of the same grouping of process, guarantee that the micro-bag forming it can receive in an orderly manner, we provide a synchronous communication strategy based on on-chip memory, ensure receiving status information continuity.
The message sending race 204 sends Task Assigned Policy as shown in Figure 4, adopts the thread pool model improved to carry out the distribution of micro-bag Processing tasks, and micro-bag of same IP grouping 402 sends task matching and processes to different threads.We provide one independently to send race's arbitration thread 404 to carry out task matching to the idle thread sending race thread pool 406, what arbitration thread continued checks whether new transmission task, if there is new task, distributed to idle thread, processing threads completes after a transmission task enters idle thread pond, can wait new allocating task to be obtained and start the process of a new round.Sending race's arbitration thread 404 can adopt on-chip memory cell to communicate with communicating of other threads, ensures the equilibrium assignment of packet-processing task.The same, we provide a synchronous communication strategy based on on-chip memory, ensure to send status information continuity.
Parallel bag Processing tasks dispatching method of the invention process can increase substantially the load balancing between parallel thread, and each thread parallel runs the access latency effectively can hiding slow devices, improves system bag processing speed, system throughput is significantly improved.
On the sheet of the embodiment of the present invention, the method realization flow of hardware thread intercommunication as shown in Figure 6 a.Each thread has an on-chip memory cell (as shown in Figure 6 b), this on-chip memory cell is for storing the status information relevant to Message processing, as buffer zone address, abandon micro-bag number, micro-packet sequence number, order of packets number (programmer of concrete status information can set as required) etc. in position, grouping, wherein most significant digit is significance bit, be 1 expression status information be wherein up-to-date effective information, be that 0 expression status information is invalid, need to wait for new effective information write.We provide a kind of atomic operation mechanism to operate highest significant position, after reading the information in shared memory cell, reset, highest significant position to solve the stationary problem of different threads being accessed to same shared memory cell.The cross-thread processing same grouping uses this unit to communicate, and transmits process state information.
Thread communication flow process shown in Fig. 6 a specifically comprises:
Step 602: after each thread obtains the bag Processing tasks of a grouping, first thread checks the classification of pending micro-bag, then jumps to step 604;
Step 604: judge whether the bag Processing tasks obtained is first the micro-bag divided into groups, and if it is jumps to step 612, if not, then jump to step 606;
Step 606: the shared memory cell reading current bag processing threads, then jumps to step 608;
Step 608: judge that whether the information read is effective, namely whether most significant digit is 1, if effectively, jump to step 610, if invalid, then jump to step 606, wait pending data effective;
Step 610: judge that whether the bag Processing tasks obtained is micro-bag of the centre of grouping, if so, then jump to step 612, if not, then jump to step 614;
Step 612: make the shared memory cell of thread own invalid, then upgrade the status information in the shared memory cell of next thread, and make it effective, then jump procedure 616;
Step 614: make the shared memory cell of thread own invalid, the shared memory cell then to next receiving thread writes null value and makes it effective, then jumps to step 616;
Step 616: the concrete process operation performing bag Processing tasks.
Illustrate with the example that is grouped into receiving 180 bytes shown in flow processing Fig. 5 below:
(1) if current thread process be first micro-bag, what such as current thread 0 processed is first micro-bag 504, first the invalid of the shared memory cell of thread 0 own is made, the highest effectively for resetting by this unit, then by the more new state information that gets as in the shared memory cell of buffer zone address writes next receiving thread 1, make this unit effective, namely highest significant position puts 1 simultaneously.
(2) if the micro-bag (non-first in the middle of the grouping of current thread process, last), what such as current thread 2 processed is second micro-bag 506, first read the status information of the shared memory cell of thread 1 own and make it invalid, here we use atomic operation mechanism, by the data reading in storage unit, and highest significant position is reset, judge that whether the data read are effective, invalid, wait for that this unit is effectively that previous thread is written with effective information, otherwise more new state information write in the shared memory cell of next receiving thread 2, make it effective simultaneously.
(3) if current thread process be last micro-bag 508, what such as current thread 2 processed is last micro-bag, first we use atomic operation to read the shared memory cell status information of thread 2 own, and will be the highest effectively for clearing makes it invalid, judge that whether the data read are effective, invalid, wait for that this unit is effectively that previous thread is written with effective information, otherwise write null value to the shared memory cell of next receiving thread 3 and make it effective.Thread 3 starts first the micro-bag receiving new grouping, repeats the operation of (1).
The thread communication strategy based on shared hierarchy of memory that the present invention proposes effectively can solve the synchronous communication problem of each cross-thread of concurrent working, each thread of same microprocessor uses distributed memory can obtain access speed faster, improve system throughput, ensure the effective transmission of Packet State information between different threads, ensure that the information continuity received in transmission processing.
The embodiment of the present invention based on many producers---as shown in Figure 7a, wherein shared queue is for the synchronous communication receiving race 202 and send between race 204, transmits effective Packet State information for shared queue's structure 700 of the serial flowing water of many Consumer model.In queue, elementary cell queue descriptor 714 as shown in Figure 7b, wherein queue plot 702 is queue unit first addresss, the grouping information stored comprises micro-bag number, order of packets number etc. in bag buffer zone address, the effective word joint number of last micro-bag, grouping, queue descriptor can be configured to 2 ~ 4 32 words by those of ordinary skill as required, and can increase required status information.Here suppose that queue descriptor is two 32 words, and namely a descriptor takies two 32 storage unit.
The each thread receiving race safeguards the tail pointer 704 of queue as the producer, its write operation 710 is after the reception Processing tasks completing a grouping, upgrade continuous two shared on-chip memory cells that tail pointer points to, queue descriptor's write queue of the Packet State information composition sending thread will be passed to, and tail pointer 704 will be increased by 2 to point to the free cells 708 of next queue; And each thread sending race safeguards the head pointer 706 of queue as consumer, first read operation 712 reads the queue descriptor of continuous two shared on-chip memory cells that queue head pointer 706 points to obtain Packet State information at first the micro-Bao Shihui starting to process a grouping, then head pointer 706 is increased by 2 to point to next effectively queue descriptor 710.
Due to same gang each thread between and thread not of the same clan problem of mutual exclusion is solved to the same unit of the queue action need that conducts interviews, the operation of concrete solution mutual exclusion can use software variable or the read lock mechanism (depending on hardware CAM) based on hardware lock, and program designer can use one as the case may be.At this, we provide a kind of realization flow of the mutual exclusion mechanism based on hardware read operation locking as shown in Figure 8, ensure the access mutual exclusion of each thread to same queue unit.We are depend on pointer end to end to the accessing operation of queue structure, therefore we only need correct tail pointer to carry out locking the mutual exclusion that just can ensure to access, to be stored in two on-chip memory cells respectively by pointer end to end for this reason, read lock will be carried out to two on-chip memory cell addresses.When thread needs access queue, no matter be read operation or write operation, all to obtain current pointer (head pointer or tail pointer), first read operation to be carried out to obtain queue pointer to the on-chip memory cell storing pointer, address locking to be carried out to this storage unit, after by storage on the pointer value write sheet after renewal, to unlock this address, need the thread of access queue to obtain access right to allow other.
The mutual exclusion flow process based on read lock shown in Fig. 8 specifically comprises:
Step 802: thread, when conducting interviews to a queue unit, will lock the address of the pointer of this unit, then jump to step 804;
Step 804: judge that whether this address is by other thread locked, if it is jumps to step 806, if not, then jump to step 810;
Step 806: wait for the unblock of other accessing operations to this address, then jump to step 808;
Step 808: judge whether this address unlocks, if unlock successfully, jump to step 810, if unlock unsuccessfully, jump to step 806, continues to wait for that address unlocks;
Step 810: the address locking this pointer, starts the operation of access queue, then jumps to step 812;
Step 812: after access terminates, unlocks current address, allows other to need the thread of access queue to obtain access right.
Many producers---the communication strategy of the series flow water-bound of many Consumer model that the present invention proposes, effectively can solve data synchronization problems, ensure packet order preserving; The mutual exclusion mechanism simultaneously provided effectively can solve shared queue's unit problem of mutual exclusion.
The above has been described in detail object of the present invention, technical scheme.Institute it should be understood that the above does not limit the scope of the invention, all any amendments, improvement etc. made within principle of the present invention and technical foundation, all should be included within protection scope of the present invention.

Claims (2)

1. based on a functional macro streamline implementation method for Multi-core processor, it is characterized in that, multiple processor is divided into different races: receive race and send race; Receive the reception process that race is responsible for message, receive race inside and adopt parallel organization, all bags that completes of multiple thread parallel receive Processing tasks; Send the transmission processing that race is responsible for message, comprise and check whether new Packet Generation task, after reading new transmission task, obtain queue descriptor's information of current head pointer, packet is sent to the transmission buffer cell of specifying from the SDRAM unit that descriptor is specified, safeguard queue head pointer with receive race carry out synchronous communication, send race inside adopt parallel organization, multiple thread parallel complete all bag transmission processing tasks; The parallel bag Processing tasks dispatching method of described reception race and transmission race is: the distribution of the message task of Liang Ge race is all based on packet being divided into several micro-bags, different micro-bag Processing tasks is distributed to different threads, each thread parallel carry out bag Processing tasks, cross-thread needs to carry out synchronous communication, transmit bag process state information, to ensure that same grouping receives the continuity of process, buffer memory, transmission;
The method of described cross-thread synchronous communication is:
Step 602: after each thread obtains the bag Processing tasks of a grouping, first thread checks the classification of pending micro-bag, then jumps to step 604;
Step 604: judge whether the bag Processing tasks obtained is first the micro-bag divided into groups, and if it is jumps to step 312, if not, then jump to step 606;
Step 606: the shared memory cell reading current bag processing threads, then jumps to step 608;
Step 608: judge that whether the information read is effective, namely whether most significant digit is 1, if effectively, jump to step 610, if invalid, then jump to step 606, wait pending data effective;
Step 610: judge that whether the bag Processing tasks obtained is micro-bag of the centre of grouping, if so, then jump to step 612, if not, then jump to step 614;
Step 612: make the shared memory cell of thread own invalid, then upgrade the status information in the shared memory cell of next thread, and make it effective, then jump to step 616;
Step 614: make the shared memory cell of thread own invalid, the shared memory cell then to next receiving thread writes null value and makes it effective, then jumps to step 616;
Step 616: the concrete process operation performing bag Processing tasks.
2. the functional macro streamline implementation method based on Multi-core processor according to claim 1, it is characterized in that, described reception race and the synchronous communication sent between race are passed through based on many producers---and shared queue's structure of the serial flowing water of many Consumer model realizes, concrete grammar is: receive each thread of race as the producer, safeguard the tail pointer of queue, after the reception Processing tasks completing a grouping, upgrade continuous two shared on-chip memory cells that tail pointer points to, queue descriptor's write queue of the Packet State information composition sending thread will be passed to, and tail pointer is increased by 2 to point to the free cells of next queue, and send each thread of race as consumer, safeguard the head pointer of queue, first read the queue descriptor of continuous two shared on-chip memory cells that queue head pointer points to obtain Packet State information at first the micro-Bao Shihui starting to process a grouping, then head pointer is increased by 2 to point to next effectively queue descriptor, with gang each thread between and thread not of the same clan the problem of mutual exclusion of operation is conducted interviews by solving based on the mutual exclusion mechanism of hardware read operation locking to the same unit of queue, concrete grammar is: step 802: thread is when conducting interviews to a queue unit, to lock the address of the pointer of this unit, then jump to step 804,
Step 804: judge that whether this address is by other thread locked, if it is jumps to step 806, if not, then jump to step 810;
Step 806: wait for the unblock of other accessing operations to this address, then jump to step 808;
Step 808: judge whether this address unlocks, if unlock successfully, jump to step 810, if unlock unsuccessfully, jump to step 806, continues to wait for that address unlocks;
Step 810: the address locking the pointer of this unit, starts the operation of access queue, then jumps to step 812;
Step 812: after access terminates, unlocks current address, allows other to need the thread of access queue to obtain access right.
CN201110309287.2A 2011-10-13 2011-10-13 Multi-core and multi-threading processor-based functional macropipeline implementing method Expired - Fee Related CN102331923B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201110309287.2A CN102331923B (en) 2011-10-13 2011-10-13 Multi-core and multi-threading processor-based functional macropipeline implementing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201110309287.2A CN102331923B (en) 2011-10-13 2011-10-13 Multi-core and multi-threading processor-based functional macropipeline implementing method

Publications (2)

Publication Number Publication Date
CN102331923A CN102331923A (en) 2012-01-25
CN102331923B true CN102331923B (en) 2015-04-22

Family

ID=45483709

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201110309287.2A Expired - Fee Related CN102331923B (en) 2011-10-13 2011-10-13 Multi-core and multi-threading processor-based functional macropipeline implementing method

Country Status (1)

Country Link
CN (1) CN102331923B (en)

Families Citing this family (36)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102708090B (en) * 2012-05-16 2014-06-25 中国人民解放军国防科学技术大学 Verification method for shared storage multicore multithreading processor hardware lock
CN103780635B (en) * 2012-10-17 2017-08-18 百度在线网络技术(北京)有限公司 Distributed asynchronous task queue execution system and method in cloud environment
CN103197920B (en) * 2013-03-25 2016-08-03 华为技术有限公司 A kind of concurrency control method, control node and system
CN103281684B (en) * 2013-05-31 2016-04-27 成都天奥电子股份有限公司 Beidou communication device and method
CN104239134B (en) 2013-06-21 2018-03-09 华为技术有限公司 The task management method and device of a kind of many-core system
CN103685053B (en) * 2013-11-26 2017-01-11 北京航空航天大学 Network processor load balancing and scheduling method based on residual task processing time compensation
CN103747011A (en) * 2014-01-23 2014-04-23 成都卡诺维科技有限公司 High-bandwidth network safety system
CN104360962B (en) * 2014-11-21 2015-10-28 北京应用物理与计算数学研究所 Be matched with multistage nested data transmission method and the system of high-performance computer structure
CN104615445B (en) * 2015-03-02 2017-09-26 长沙新弘软件有限公司 A kind of equipment I O queuing methods based on atomic operation
CN106506393A (en) * 2016-02-05 2017-03-15 华为技术有限公司 A kind of data flow processing method, device and system
CN105808357B (en) * 2016-03-29 2021-07-27 沈阳航空航天大学 Multi-core multi-thread processor with accurately controllable performance
CN105959161B (en) * 2016-07-08 2019-04-26 中国人民解放军国防科学技术大学 A kind of high speed packet construction and distribution control method and equipment
CN107077390B (en) * 2016-07-29 2021-06-29 华为技术有限公司 Task processing method and network card
CN107864391B (en) * 2017-09-19 2020-03-13 北京小鸟科技股份有限公司 Video stream cache distribution method and device
CN109831394B (en) * 2017-11-23 2021-07-09 华为技术有限公司 Data processing method, terminal and computer storage medium
CN108363624B (en) * 2018-02-12 2022-04-19 聚好看科技股份有限公司 Method, device and server for orderly controlling storage information by lockless threads
CN108494705A (en) * 2018-03-13 2018-09-04 山东超越数控电子股份有限公司 A kind of network message high_speed stamping die and method
CN108737292A (en) * 2018-04-18 2018-11-02 千寻位置网络有限公司 A kind of sending method and system, server of bulk messages
CN109408118B (en) * 2018-09-29 2024-01-02 古进 MHP heterogeneous multi-pipeline processor
CN109614220B (en) * 2018-10-26 2020-06-30 阿里巴巴集团控股有限公司 Multi-core system processor and data updating method
CN109614152B (en) * 2018-12-06 2022-11-04 镕铭微电子(济南)有限公司 Hardware acceleration module and storage device
CN109783229A (en) * 2018-12-17 2019-05-21 平安普惠企业管理有限公司 The method and device of thread resources distribution
CN109918209B (en) * 2019-01-28 2021-02-02 深兰科技(上海)有限公司 Method and equipment for communication between threads
CN110011936B (en) * 2019-03-15 2023-02-17 北京星网锐捷网络技术有限公司 Thread scheduling method and device based on multi-core processor
CN110147254A (en) * 2019-05-23 2019-08-20 苏州浪潮智能科技有限公司 A kind of data buffer storage processing method, device, equipment and readable storage medium storing program for executing
CN110381034B (en) * 2019-06-25 2022-02-22 苏州浪潮智能科技有限公司 Message processing method, device, equipment and readable storage medium
CN112416539B (en) * 2019-08-21 2022-11-15 无锡江南计算技术研究所 Multi-task parallel scheduling method for heterogeneous many-core processor
CN110704199B (en) * 2019-09-06 2024-07-05 深圳平安通信科技有限公司 Data compression method, device, computer equipment and storage medium
CN110730130B (en) * 2019-10-22 2022-04-22 迈普通信技术股份有限公司 Message sending method, device, network equipment and storage medium
CN111884948B (en) * 2020-07-09 2022-08-12 烽火通信科技股份有限公司 Assembly line scheduling method and device
CN112269392B (en) * 2020-09-16 2023-08-22 广西电网有限责任公司电力科学研究院 Unmanned aerial vehicle cluster control ground workstation system and control method thereof
CN112256208B (en) * 2020-11-02 2023-07-28 南京云信达科技有限公司 Offline data packet storage analysis method and device
CN113364685B (en) * 2021-05-17 2023-03-14 中国人民解放军国防科技大学 Distributed MAC table item processing device and method
CN113360448B (en) * 2021-06-04 2023-04-07 展讯通信(上海)有限公司 Data packet processing method and device
CN115185878A (en) * 2022-05-24 2022-10-14 中科驭数(北京)科技有限公司 Multi-core packet network processor architecture and task scheduling method
CN115080206B (en) * 2022-06-14 2023-08-08 哈尔滨工业大学 High-speed echo data real-time recording system and method based on multithreading mechanism

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1149870C (en) * 1995-05-04 2004-05-12 因特威夫通讯国际有限公司 Spreade spectrum communication network signal processor
CN1889667A (en) * 2006-07-26 2007-01-03 浙江大学 Video frequency signal multi-processor parallel processing method
CN1987792A (en) * 2006-12-20 2007-06-27 金魁 Application system for high grade multiple line distance management
CN101015136A (en) * 2004-07-08 2007-08-08 摩托罗拉公司 Method and apparatus for transmitting and receiving a data symbol stream
CN101261613A (en) * 2007-03-09 2008-09-10 南京理工大学 Image processor team interface bus
US7565651B1 (en) * 2000-05-25 2009-07-21 Oracle International Corporation Parallel task scheduling system for computers

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1149870C (en) * 1995-05-04 2004-05-12 因特威夫通讯国际有限公司 Spreade spectrum communication network signal processor
US7565651B1 (en) * 2000-05-25 2009-07-21 Oracle International Corporation Parallel task scheduling system for computers
CN101015136A (en) * 2004-07-08 2007-08-08 摩托罗拉公司 Method and apparatus for transmitting and receiving a data symbol stream
CN1889667A (en) * 2006-07-26 2007-01-03 浙江大学 Video frequency signal multi-processor parallel processing method
CN1987792A (en) * 2006-12-20 2007-06-27 金魁 Application system for high grade multiple line distance management
CN101261613A (en) * 2007-03-09 2008-09-10 南京理工大学 Image processor team interface bus

Also Published As

Publication number Publication date
CN102331923A (en) 2012-01-25

Similar Documents

Publication Publication Date Title
CN102331923B (en) Multi-core and multi-threading processor-based functional macropipeline implementing method
US10102179B2 (en) Multiple core computer processor with globally-accessible local memories
Ajima et al. Tofu interconnect 2: System-on-chip integration of high-performance interconnect
US9734056B2 (en) Cache structure and management method for use in implementing reconfigurable system configuration information storage
US9195610B2 (en) Transaction info bypass for nodes coupled to an interconnect fabric
US8103853B2 (en) Intelligent fabric system on a chip
CN110347635A (en) A kind of heterogeneous polynuclear microprocessor based on multilayer bus
Abadal et al. WiSync: An architecture for fast synchronization through on-chip wireless communication
CN103744644B (en) The four core processor systems built using four nuclear structures and method for interchanging data
CN105207957B (en) A kind of system based on network-on-chip multicore architecture
CN102521201A (en) Multi-core DSP (digital signal processor) system-on-chip and data transmission method
CN105183662A (en) Cache consistency protocol-free distributed sharing on-chip storage framework
CN105094751A (en) Memory management method used for parallel processing of streaming data
Hamidouche et al. Exploiting GPUDirect RDMA in designing high performance OpenSHMEM for NVIDIA GPU clusters
CN106951390A (en) It is a kind of to reduce the NUMA system construction method of cross-node Memory accessing delay
CN109542832A (en) Communication system and method between a kind of heterogeneous polynuclear CPU of no lock mechanism
Contini et al. Enabling Reconfigurable HPC through MPI-based Inter-FPGA Communication
Lant et al. Enabling shared memory communication in networks of MPSoCs
Zimmer et al. Nocmsg: Scalable noc-based message passing
Mamidala et al. Optimizing mpi collectives using efficient intra-node communication techniques over the blue gene/p supercomputer
CN114116167B (en) High-performance computing-oriented regional autonomous heterogeneous many-core processor
US20240281395A1 (en) Embedded-Oriented Configurable Many-Core Processor
US8782164B2 (en) Implementing asyncronous collective operations in a multi-node processing system
Huang et al. Accelerating NoC-based MPI primitives via communication architecture customization
Kariniemi et al. NoC Interface for fault-tolerant Message-Passing communication on Multiprocessor SoC platform

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
DD01 Delivery of document by public notice
DD01 Delivery of document by public notice

Addressee: Li Kang

Document name: Notice of termination of patent right

CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20150422

Termination date: 20201013