CN116009966A - Circuit system and method for realizing heterogeneous multiprocessor operation distribution - Google Patents

Circuit system and method for realizing heterogeneous multiprocessor operation distribution Download PDF

Info

Publication number
CN116009966A
CN116009966A CN202310125761.9A CN202310125761A CN116009966A CN 116009966 A CN116009966 A CN 116009966A CN 202310125761 A CN202310125761 A CN 202310125761A CN 116009966 A CN116009966 A CN 116009966A
Authority
CN
China
Prior art keywords
data
memory buffer
type
arbiter
polling
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310125761.9A
Other languages
Chinese (zh)
Inventor
蔡志恺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202310125761.9A priority Critical patent/CN116009966A/en
Publication of CN116009966A publication Critical patent/CN116009966A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention provides a circuit system and a method for realizing operation allocation of heterogeneous multiprocessor, belonging to the technical field of operation allocation of heterogeneous processors, wherein the system comprises a data receiving unit, a data transmitting unit, a data polling arbiter, a data memory buffer unit, an operation polling arbiter, an operation accelerator, a data transmitting arbiter and a data transmitting unit; the data receiving unit requests data buffering through the data polling arbiter according to the data type, writes the data into the data memory buffer unit of the corresponding type, starts the selected operation accelerator, sends out a request to the data transmission arbiter of the same type after the operation is completed, and carries out data feedback. The invention realizes the effective resource allocation of different processors or accelerators in the heterogeneous system, and by setting type marks for the data and carrying out system-level task parallel operation with the circuit system of the application, different data are shunted, so that the efficiency bottleneck of centralized integration and redistribution is avoided.

Description

Circuit system and method for realizing heterogeneous multiprocessor operation distribution
Technical Field
The invention belongs to the technical field of operation allocation of heterogeneous processors, and particularly relates to a circuit system and a method for realizing operation allocation of heterogeneous multiprocessors.
Background
In the existing computer system, a CPU or a GPU has multiple cores and multiple threads, and can perform parallel operation processing to accelerate specific applications. Such as one particular CPU core that may be used by an operating system or application software, e.g., CPU [0], while fixing computing or allocation operations to other CPU cores, e.g., CPU [ X ]. However, in the new generation of system single chip, more and more heterogeneous accelerators are integrated, and how to effectively allocate the resources of each accelerator is an urgent problem to be solved.
The prior patent of CN114281559A discloses a multi-core processor, a synchronization method for the multi-core processor and corresponding products, and an operation accelerator mainly performs cooperative work among the multi-cores through a synchronization instruction.
The prior patent of the invention with publication number CN110879744A establishes a computational graph in which the relative index of the operation data of at least one operation in the memory space is declared; and creating a plurality of first threads, distributing corresponding memory space for each first thread, and realizing data parallel acceleration by directly copying the calculation map under a distributed environment.
The prior patent of CN102591722B, namely a multithreading resource allocation processing method and system of a network-on-chip multi-core processor, uses a centralized control index table to allocate the multithreading.
However, the above three methods mainly consider that the parallel operation of the same task or data by using the multi-core processor with the same architecture does not involve the resource allocation of different processors or accelerators in the heterogeneous system and the parallel operation of the tasks at the system level, and the third method further requires the centralized index to perform the recording and allocation of the multi-core/multi-thread processor.
This is a deficiency of the prior art, and therefore, it is desirable to provide a circuit system and method for implementing heterogeneous multiprocessor operation distribution, which addresses the above-described deficiencies of the prior art.
Disclosure of Invention
Aiming at the defects that the prior multi-core processor cooperative work in the prior art is parallel operation of the multi-core processor with the same framework on the same task or data, and the resource allocation of different processors or accelerators in a heterogeneous system and the parallel operation of system-level tasks are not involved, the invention provides a circuit system and a method for realizing the operation allocation of heterogeneous multi-processors, so as to solve the technical problems.
In a first aspect, the present invention provides a circuit system for implementing heterogeneous multiprocessor operation allocation, including a data receiving unit and a data transmitting unit;
the data receiving unit is connected with a plurality of types of data polling arbiters, each type of data polling arbiters is connected with a corresponding type of data memory buffer unit, each type of data memory buffer unit is connected with a corresponding type of operation polling arbiters, each type of operation polling arbiters is connected with a plurality of operation accelerators of corresponding types, and the operation accelerators are connected with the same type of data transmission arbiters;
the operation accelerator of the same type is connected with a data transmission arbiter of the same type;
each type of operation accelerator is connected with the data transmission unit;
the data receiving unit is also connected with each type of data memory buffer unit, and each type of data memory buffer unit is also connected with each operation accelerator of the same type;
each type of data memory buffer unit is connected with a read enable OR gate, and the read enable OR gate is connected with a group of operation accelerators of the same type. The data includes task or material, and the data receiving unit receives task or material data.
Further, the data memory buffer unit adopts a FIFO type data memory buffer unit, and controls the data to enter and exit according to the first-in first-out principle.
Further, the data polling arbiter is provided with a data request signal receiving interface and a data response signal sending interface;
the data memory buffer unit is provided with a data inlet, a data outlet, a non-empty signal output port, a write enable signal port and a read enable signal port;
the data request signal receiving interface is connected with the data receiving unit, and the data response signal sending interface is connected with the data receiving unit and the write enabling signal port;
the data inlet is connected with the data receiving unit;
the data outlet is connected with the same type of operation accelerator.
Further, the operation polling arbiter is provided with an enabling signal port, a plurality of operation request signal receiving interfaces and a plurality of operation response signal sending interfaces;
the enabling signal port is connected with a non-empty signal output port of the data memory buffer unit;
each operation request signal receiving interface is connected with a corresponding operation accelerator, and each operation response signal sending interface is connected with a corresponding operation accelerator and one input end of a reading enabling OR gate;
the output end of the read enable OR gate is connected with the read enable signal port of the data memory buffer unit.
Further, the data transmission arbiter is provided with a plurality of transmission request signal receiving interfaces and a plurality of transmission response signal sending interfaces;
each transmission request signal receiving interface is connected with one operation accelerator of the same type, and each data transmission response signal sending interface is connected with the data transmission unit and one operation accelerator of the same type.
Further, the device also comprises a multiplexer and a write enable OR gate;
the number of the data receiving units is a plurality of;
the number of the data request signal receiving interfaces and the number of the data response signal sending interfaces of the data polling arbiter are the same as the number of the data receiving units;
each data receiving unit is connected with one input end of the multiplexer through a data line, and the output end of the multiplexer is connected with a data inlet of the data memory buffer unit;
the data response signal sending interface of each type of data polling arbiter is connected with one input end of a write enable or gate, and the output end of the write enable or gate is connected with the write enable signal of the data memory buffer unit. Multiple data receiving units may be guaranteed to receive data simultaneously.
In a second aspect, the present invention provides a method for implementing heterogeneous multiprocessor operation allocation based on the first aspect, including the following steps:
s1, judging the data type after the data receiving unit receives the data, requesting data buffering from a data polling arbiter of a corresponding type, and writing the data into a data memory buffer unit of the corresponding type after the data polling arbiter allows the data;
s2, after the data memory buffer unit detects that data is written in, starting an operation polling arbiter of the same type;
s3, the started operation polling arbiter selects one operation accelerator from the operation accelerators of the same type according to a polling mechanism and allows the selected operation accelerator to be started;
s4, the started operation accelerator acquires data from the data memory buffer units of the same type to perform operation, and sends a request to the data transmission arbiter of the same type after the operation is completed, and the data after the operation is completed is returned through the data transmission unit.
Further, the specific steps of step S1 are as follows:
s11, adding a type field in advance for the network packet of the data;
s12, after the data receiving unit receives the data, judging the data type according to the type field of the data;
s13, the data receiving unit requests data buffering from the same type of data polling arbitrators according to the data type;
s14, the data polling arbiter returns a response signal to the data receiving unit and the data memory buffer units of the same type at the same time, so that the data memory buffer units are enabled;
s15, the data receiving unit starts data transmission, and writes the data into the enabled data memory buffer units of the same type.
Further, the specific steps of step S3 are as follows:
s31, the started operation polling arbiter receives request signals of all the operation accelerators, and selects one operation accelerator with the same type according to a polling mechanism;
s32, the operation polling arbiter generates an operation response signal and returns the operation response signal to the selected operation accelerator and the data memory buffer units of the same type;
s33, starting an operation accelerator which receives the operation response signal;
s34, after the same type of operation polling arbiter sends operation response signals to any one type of operation accelerator, the data memory buffer unit outputs the buffered data according to the first-in first-out principle.
Further, the specific steps of step S4 are as follows:
s41, the started operation accelerator receives the data output by the data memory buffer units of the same type and operates the data;
s42, after the data operation is completed, the started operation accelerator sends a request to the data transmission arbiters of the same type;
s43, the data transmission arbiter returns a transmission response signal to the same type of operation accelerator which correspondingly completes the operation and the data transmission unit;
s44, the operation accelerator which receives the transmission response signal transmits the data after operation to the data transmission unit;
s45, the data transmission unit receiving the transmission response signal transmits the received data back.
Further, when the number of data receiving units is several,
in step S14, the data polling arbiter simultaneously returns a response signal to the same type of data memory buffer unit through the write enable or gate;
in step S15, the data receiving unit writes the data into the enabled data memory buffer unit of the same type through the multiplexer.
The invention has the advantages that,
the circuit system and the method for realizing the operation allocation of the heterogeneous multiprocessor realize the effective resource allocation of different processors or accelerators in the heterogeneous system, and the circuit system is matched with the circuit system for carrying out the task parallel operation of the system level by setting type marks for the data so as to shunt different data and avoid the efficiency bottleneck of centralized integration and redistribution.
In addition, the invention has reliable design principle, simple structure and very wide application prospect.
It can be seen that the present invention has outstanding substantial features and significant advances over the prior art, as well as the benefits of its implementation.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the description of the embodiments or the prior art will be briefly described below, and it will be obvious to those skilled in the art that other drawings can be obtained from these drawings without inventive effort.
FIG. 1 is a schematic diagram of an embodiment 3 of circuitry for implementing heterogeneous multiprocessor operation distribution.
FIG. 2 is a schematic diagram of circuitry for implementing heterogeneous multiprocessor operation allocation in accordance with embodiment 4 of the present invention.
FIG. 3 is a flow chart of an embodiment 5 of a method for implementing heterogeneous multiprocessor operation allocation according to the present invention.
FIG. 4 is a flow chart of embodiment 6 of the method for implementing heterogeneous multiprocessor operation allocation according to the present invention.
Detailed Description
In order to make the technical solution of the present invention better understood by those skilled in the art, the technical solution of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
Example 1:
as shown in fig. 1, the present invention provides a circuit system for implementing heterogeneous multiprocessor operation allocation, which includes a data receiving unit and a data transmitting unit;
the data receiving unit is connected with a plurality of types of data polling arbiters, each type of data polling arbiters is connected with a corresponding type of data memory buffer unit, each type of data memory buffer unit is connected with a corresponding type of operation polling arbiters, each type of operation polling arbiters is connected with a plurality of operation accelerators of corresponding types, and the operation accelerators are connected with the same type of data transmission arbiters;
the operation accelerator of the same type is connected with a data transmission arbiter of the same type;
each type of operation accelerator is connected with the data transmission unit;
the data receiving unit is also connected with each type of data memory buffer unit, and each type of data memory buffer unit is also connected with each operation accelerator of the same type;
each type of data memory buffer unit is connected with a read enable OR gate, and the read enable OR gate is connected with a group of operation accelerators of the same type.
Example 2:
as shown in fig. 1, the present invention provides a circuit system for implementing heterogeneous multiprocessor operation allocation, which includes a data receiving unit and a data transmitting unit;
the data receiving unit is connected with a plurality of types of data polling arbiters, each type of data polling arbiters is connected with a corresponding type of data memory buffer unit, each type of data memory buffer unit is connected with a corresponding type of operation polling arbiters, each type of operation polling arbiters is connected with a plurality of operation accelerators of corresponding types, and the operation accelerators are connected with the same type of data transmission arbiters;
the operation accelerator of the same type is connected with a data transmission arbiter of the same type;
each type of operation accelerator is connected with the data transmission unit;
the data receiving unit is also connected with each type of data memory buffer unit, and each type of data memory buffer unit is also connected with each operation accelerator of the same type;
each type of data memory buffer unit is connected with a read enabling OR gate which is connected with a group of operation accelerators of the same type;
the data polling arbiter is provided with a data request signal receiving interface and a data response signal sending interface;
the data memory buffer unit is provided with a data inlet, a data outlet, a non-empty signal output port, a write enable signal port and a read enable signal port;
the data request signal receiving interface is connected with the data receiving unit, and the data response signal sending interface is connected with the data receiving unit and the write enabling signal port;
the data inlet is connected with the data receiving unit;
the data outlet is connected with the same type of operation accelerator;
the operation polling arbiter is provided with an enabling signal port, a plurality of operation request signal receiving interfaces and a plurality of operation response signal sending interfaces;
the enabling signal port is connected with a non-empty signal output port of the data memory buffer unit;
each operation request signal receiving interface is connected with a corresponding operation accelerator, and each operation response signal sending interface is connected with a corresponding operation accelerator and one input end of a reading enabling OR gate;
the output end of the read enable OR gate is connected with the read enable signal port of the data memory buffer unit;
the data transmission arbiter is provided with a plurality of transmission request signal receiving interfaces and a plurality of transmission response signal sending interfaces;
each transmission request signal receiving interface is connected with one operation accelerator of the same type, and each data transmission response signal sending interface is connected with the data transmission unit and one operation accelerator of the same type.
Example 3:
as shown in fig. 1, the present invention provides a circuit system for implementing heterogeneous multiprocessor operation allocation, which includes a data receiving unit and a data transmitting unit;
the data receiving unit is connected with two types of data polling arbiters, a first data polling arbiter and a second data polling arbiter, the first data polling arbiter is connected with a first data memory buffer unit, and the second data polling arbiter is connected with a second data memory buffer unit;
the first data memory buffer unit is connected with a first operation polling arbiter, the first operation polling arbiter is connected with a first operation accelerator A and a first operation accelerator B, and the first operation accelerator A is connected with a first data transmission arbiter of the same type; the first operation accelerator B is also connected with the first data transmission arbiter;
the second data memory buffer unit is connected with a second operation polling arbiter, the second operation polling arbiter is connected with a second operation accelerator C and a second operation accelerator D, and the second operation accelerator C is connected with a second data transmission arbiter of the same type; the second operation accelerator D is also connected with the second data transmission arbiter;
the first operation accelerator A, the first operation accelerator B, the second operation accelerator C and the second operation accelerator D are all connected with the data transmission unit;
the data receiving unit is also connected with a first data memory buffer unit and a second data memory buffer unit, the first data memory buffer unit is also connected with a first operation accelerator A and a first operation accelerator B, and the second data memory buffer unit is also connected with a second operation accelerator C and a second operation accelerator D;
the first data memory buffer unit is connected with a first reading enabling OR gate, the second data memory buffer unit is connected with a second reading enabling OR gate, the first reading enabling OR gate is connected with the first operation accelerator A and the first operation accelerator B, and the second reading enabling OR gate is connected with the second operation accelerator C and the second operation accelerator D;
the data polling arbiter is provided with a data request signal receiving interface and a data response signal sending interface;
the data memory buffer unit is provided with a data inlet, a data outlet, a non-empty signal output port, a write enable signal port and a read enable signal port;
the data request signal interface of the first data polling arbiter and the data request signal interface of the second data polling arbiter are both connected with the data receiving unit, the data response signal sending interface of the first data polling arbiter and the data response signal sending interface of the second data polling arbiter are both connected with the data receiving unit, the data response signal sending interface of the first data polling arbiter is connected with the write enabling signal port of the first data memory buffer unit, and the data response signal sending interface of the second data polling arbiter is connected with the write enabling signal port of the second data memory buffer unit;
the data inlet of the first data memory buffer unit and the data inlet of the second data memory buffer unit are connected with the data receiving unit;
the data outlet of the first data memory buffer unit is connected with the first operation accelerator A and the first operation accelerator B;
the data outlet of the second data memory buffer unit is connected with the second operation accelerator C and the second operation accelerator D;
the operation polling arbiter is provided with an enabling signal port, a plurality of operation request signal receiving interfaces and two operation response signal sending interfaces;
the enabling signal port of the first operation polling arbiter is connected with the non-empty signal output port of the first data memory buffer unit, and the enabling signal port of the second operation polling arbiter is connected with the non-empty signal output port of the second data memory buffer unit;
the operation request signal receiving interface of the first operation polling arbiter is connected with the first operation accelerator A and the first operation accelerator B, and the operation request signal receiving interface of the second operation polling arbiter is connected with the second operation accelerator C and the second operation accelerator D;
the operation response signal sending interface of the first operation polling arbiter is connected with the first operation accelerator A and the first operation accelerator B, and the two operation response signal sending interfaces of the first operation polling arbiter are connected with the two input ends of the first reading enabling OR gate;
the operation response signal sending interface of the second operation polling arbiter is connected with the second operation accelerator C and the second operation accelerator D, and the two operation response signal sending interfaces of the second operation polling arbiter are connected with the two input ends of the second reading enabling OR gate;
the output end of the first reading enabling OR gate is connected with the reading enabling signal port of the first data memory buffer unit, and the output end of the second reading enabling OR gate is connected with the reading enabling signal port of the second data memory buffer unit;
the data transmission arbiter is provided with two transmission request signal receiving interfaces and two transmission response signal sending interfaces;
the first operation accelerator A and the first operation accelerator B are respectively connected with two transmission request signal receiving interfaces of the first operation polling arbiter, and the first operation accelerator A and the first operation accelerator B are respectively connected with two transmission response signal sending interfaces of the first operation polling arbiter;
the second operation accelerator C and the second operation accelerator D are respectively connected with two transmission request signal receiving interfaces of the first operation polling arbiter, and the second operation accelerator C and the second operation accelerator D are respectively connected with two transmission response signal sending interfaces of the second operation polling arbiter.
Example 4:
as shown in fig. 2, the present invention provides a circuit system for implementing heterogeneous multiprocessor operation allocation, which includes a first data receiving unit, a second data receiving unit, and a data transmitting unit;
the first data receiving unit is connected with two types of data polling arbiters, a first data polling arbiter and a second data polling arbiter, the first data polling arbiter is connected with a first data memory buffer unit, and the second data polling arbiter is connected with a second data memory buffer unit;
the second data receiving unit is connected with the first data polling arbiter and the second data polling arbiter;
the first data memory buffer unit is connected with a first operation polling arbiter, the first operation polling arbiter is connected with a first operation accelerator A and a first operation accelerator B, and the first operation accelerator A is connected with a first data transmission arbiter of the same type; the first operation accelerator B is also connected with the first data transmission arbiter;
the second data memory buffer unit is connected with a second operation polling arbiter, the second operation polling arbiter is connected with a second operation accelerator C and a second operation accelerator D, and the second operation accelerator C is connected with a second data transmission arbiter of the same type; the second operation accelerator D is also connected with the second data transmission arbiter;
the first operation accelerator A, the first operation accelerator B, the second operation accelerator C and the second operation accelerator D are all connected with the data transmission unit;
the first data receiving unit is also connected with a first data memory buffer unit and a second data memory buffer unit, the first data memory buffer unit is also connected with a first operation accelerator A and a first operation accelerator B, and the second data memory buffer unit is also connected with a second operation accelerator C and a second operation accelerator D;
the second data receiving unit is connected with the first data memory buffer unit and the second data memory buffer unit;
the first data memory buffer unit is connected with a first read enable OR gate, a first write enable OR gate and a first multiplexer, the second data memory buffer unit is connected with a second read enable OR gate, a second write enable OR gate and a second multiplexer, the first read enable OR gate is connected with the first operation accelerator A and the first operation accelerator B, and the second read enable OR gate is connected with the second operation accelerator C and the second operation accelerator D;
the data polling arbiter is provided with two data request signal receiving interfaces and two data response signal sending interfaces;
the data memory buffer unit is provided with a data inlet, a data outlet, a non-empty signal output port, a write enable signal port and a read enable signal port;
the first data request signal interface of the first data polling arbiter and the first data request signal interface of the second data polling arbiter are both connected with the first data receiving unit, the second data request signal interface of the first data polling arbiter and the second data request signal interface of the second data polling arbiter are both connected with the second data receiving unit, the first data response signal sending interface of the first data polling arbiter and the first data response signal sending interface of the second data polling arbiter are both connected with the first data receiving unit, the second data response signal sending interface of the first data polling arbiter and the second data response signal sending interface of the second data polling arbiter are connected with two input ends of the first write enable or gate, the second data response signal sending interface of the first data polling arbiter and the second data response signal sending interface of the second data polling arbiter are connected with two input ends of the second write enable or gate, the output end of the first write enable or gate is connected with the write enable signal interface of the first data memory buffer unit, and the output end of the second write enable or gate is connected with the write enable signal interface of the second data memory unit;
the data input of the first data memory buffer unit is connected with the output end of the first multiplexer, the first data receiving unit and the second data receiving unit are connected with the two input ends of the first multiplexer, the data input of the second data memory buffer unit is connected with the output end of the second multiplexer, and the first data receiving unit and the second data receiving unit are connected with the two input ends of the second multiplexer;
the data outlet of the first data memory buffer unit is connected with the first operation accelerator A and the first operation accelerator B;
the data outlet of the second data memory buffer unit is connected with the second operation accelerator C and the second operation accelerator D;
the operation polling arbiter is provided with an enabling signal port, a plurality of operation request signal receiving interfaces and two operation response signal sending interfaces;
the enabling signal port of the first operation polling arbiter is connected with the non-empty signal output port of the first data memory buffer unit, and the enabling signal port of the second operation polling arbiter is connected with the non-empty signal output port of the second data memory buffer unit;
the operation request signal receiving interface of the first operation polling arbiter is connected with the first operation accelerator A and the first operation accelerator B, and the operation request signal receiving interface of the second operation polling arbiter is connected with the second operation accelerator C and the second operation accelerator D;
the operation response signal sending interface of the first operation polling arbiter is connected with the first operation accelerator A and the first operation accelerator B, and the two operation response signal sending interfaces of the first operation polling arbiter are connected with the two input ends of the first reading enabling OR gate;
the operation response signal sending interface of the second operation polling arbiter is connected with the second operation accelerator C and the second operation accelerator D, and the two operation response signal sending interfaces of the second operation polling arbiter are connected with the two input ends of the second reading enabling OR gate;
the output end of the first reading enabling OR gate is connected with the reading enabling signal port of the first data memory buffer unit, and the output end of the second reading enabling OR gate is connected with the reading enabling signal port of the second data memory buffer unit;
the data transmission arbiter is provided with two transmission request signal receiving interfaces and two transmission response signal sending interfaces;
the first operation accelerator A and the first operation accelerator B are respectively connected with two transmission request signal receiving interfaces of the first operation polling arbiter, and the first operation accelerator A and the first operation accelerator B are respectively connected with two transmission response signal sending interfaces of the first operation polling arbiter;
the second operation accelerator C and the second operation accelerator D are respectively connected with two transmission request signal receiving interfaces of the first operation polling arbiter, and the second operation accelerator C and the second operation accelerator D are respectively connected with two transmission response signal sending interfaces of the second operation polling arbiter.
Example 5:
as shown in fig. 3, the present invention provides a method for implementing heterogeneous multiprocessor operation allocation based on embodiment 1 or embodiment 2, which includes the following steps:
s1, judging the data type after the data receiving unit receives the data, requesting data buffering from a data polling arbiter of a corresponding type, and writing the data into a data memory buffer unit of the corresponding type after the data polling arbiter allows the data;
s2, after the data memory buffer unit detects that data is written in, starting an operation polling arbiter of the same type;
s3, the started operation polling arbiter selects one operation accelerator from the operation accelerators of the same type according to a polling mechanism and allows the selected operation accelerator to be started;
s4, the started operation accelerator acquires data from the data memory buffer units of the same type to perform operation, and sends a request to the data transmission arbiter of the same type after the operation is completed, and the data after the operation is completed is returned through the data transmission unit.
Example 6:
as shown in fig. 4, the present invention provides a method for implementing heterogeneous multiprocessor operation allocation, which includes the following steps:
s1, judging the data type after the data receiving unit receives the data, requesting data buffering from a data polling arbiter of a corresponding type, and writing the data into a data memory buffer unit of the corresponding type after the data polling arbiter allows the data; the specific steps of the step S1 are as follows:
s11, adding a type field in advance for the network packet of the data;
s12, after the data receiving unit receives the data, judging the data type according to the type field of the data;
s13, the data receiving unit requests data buffering from the same type of data polling arbitrators according to the data type;
s14, the data polling arbiter returns a response signal to the data receiving unit and the data memory buffer units of the same type at the same time, so that the data memory buffer units are enabled;
s15, the data receiving unit starts data transmission, and writes the data into the enabled data memory buffer units of the same type;
s2, after the data memory buffer unit detects that data is written in, starting an operation polling arbiter of the same type;
s3, the started operation polling arbiter selects one operation accelerator from the operation accelerators of the same type according to a polling mechanism and allows the selected operation accelerator to be started; the specific steps of the step S3 are as follows:
s31, the started operation polling arbiter receives request signals of all the operation accelerators, and selects one operation accelerator with the same type according to a polling mechanism;
s32, the operation polling arbiter generates an operation response signal and returns the operation response signal to the selected operation accelerator and the data memory buffer units of the same type;
s33, starting an operation accelerator which receives the operation response signal;
s34, after the same type of operation polling arbiters send operation response signals to any one type of operation accelerators, the data memory buffer unit outputs cached data according to the first-in first-out principle;
s4, the started operation accelerator acquires data from the data memory buffer units of the same type to perform operation, and sends a request to the data transmission arbitrator of the same type after the operation is completed, and the data after the operation is completed is returned through the data transmission unit; the specific steps of the step S4 are as follows:
s41, the started operation accelerator receives the data output by the data memory buffer units of the same type and operates the data;
s42, after the data operation is completed, the started operation accelerator sends a request to the data transmission arbiters of the same type;
s43, the data transmission arbiter returns a transmission response signal to the same type of operation accelerator which correspondingly completes the operation and the data transmission unit;
s44, the operation accelerator which receives the transmission response signal transmits the data after operation to the data transmission unit;
s45, the data transmission unit receiving the transmission response signal transmits the received data back.
In some embodiments, as shown in fig. 2, in the above-described embodiment 4, when the number of data receiving units is a plurality,
in step S14, the data polling arbiter simultaneously returns a response signal to the same type of data memory buffer unit through the write enable or gate;
in step S15, the data receiving unit writes the data into the enabled data memory buffer unit of the same type through the multiplexer.
For embodiment 6, when the data receiving unit is one, as shown in fig. 1, after the data receiving unit receives the first data, the data TYPE is determined, if the data TYPE is TYPE-X, a request is sent to the first data polling arbiter corresponding to TYPE-X, and after approval is obtained, the request is written into the first data memory buffer unit;
after the data in the first data memory buffer unit is stored, a non-empty signal is generated, the non-empty signal is used as a trigger signal, and the first operation polling arbiter is triggered, at this time, the first polling arbiter is provided with two first operation accelerators, namely a first operation accelerator A and a first operation accelerator B, and the first operation polling arbiter agrees to one of the first operation accelerators to read the data, such as the first operation accelerator A, for operation according to a polling mechanism;
if the second data is received at this time, judging the data TYPE, if the data TYPE is TYPE-X, sending a request to a first data polling arbiter which is still corresponding to TYPE-X, and when the first data memory buffer unit is full, disallowing, and when the data is read, agreeing, and writing the data into the first data memory buffer unit after agreeing; at this time, the first operation polling arbiter agrees to the first operation accelerator B to operate according to the polling mechanism because the first operation accelerator a is being used by the first data;
if the third data is received in the data processing process, still judging the data TYPE, if the TYPE-Y data TYPE is the same, sending a request to a second data polling arbiter corresponding to TYPE-Y, and writing the request into a second data memory buffer unit after the request is agreed; at the same time, the first data memory buffer unit still has second data, but the second memory buffer unit is parallel to the first memory buffer unit, no matter whether the first data memory buffer unit is full or not, the second data memory buffer unit can directly write in, after having data, the second data memory buffer unit can generate a 'non-empty' signal, the 'non-empty' signal is used as a trigger signal, and the second operation polling arbiter is triggered, at the moment, the second polling arbiter has two second operation accelerators, a second operation accelerator C and a second operation accelerator D, and the second operation polling arbiter agrees that one of the second operation accelerators reads the data according to a polling mechanism to perform operation;
if the fourth data are received in the data processing process, the data TYPE is still judged, if the data TYPE is TYPE-X data at the time, a request is sent to a first data polling arbiter corresponding to TYPE-X, after the request is agreed, the request is written into a first data memory buffer unit, the first operation polling arbiter finds that the first operation accelerator A and the second operation accelerator B are both in calculation according to a polling mechanism, the fourth TYPE-X data TYPE needs to wait, and when any one or two of the first operation accelerator A and the first operation accelerator B finish working, one of the operation accelerators is agreed to operate according to the polling mechanism;
after the operation is completed, the corresponding operation accelerator, the corresponding data transmission arbiter request, sends the result to the data transmission unit and returns the result.
Although the present invention has been described in detail by way of preferred embodiments with reference to the accompanying drawings, the present invention is not limited thereto. Various equivalent modifications and substitutions may be made in the embodiments of the present invention by those skilled in the art without departing from the spirit and scope of the present invention, and it is intended that all such modifications and substitutions be within the scope of the present invention/be within the scope of the present invention as defined by the appended claims. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. A circuit system for realizing heterogeneous multiprocessor operation allocation, which is characterized by comprising a data receiving unit and a data transmitting unit;
the data receiving unit is connected with a plurality of types of data polling arbiters, each type of data polling arbiters is connected with a corresponding type of data memory buffer unit, each type of data memory buffer unit is connected with a corresponding type of operation polling arbiters, each type of operation polling arbiters is connected with a plurality of operation accelerators of corresponding types, and the operation accelerators are connected with the same type of data transmission arbiters;
the operation accelerator of the same type is connected with a data transmission arbiter of the same type;
each type of operation accelerator is connected with the data transmission unit;
the data receiving unit is also connected with each type of data memory buffer unit, and each type of data memory buffer unit is also connected with each operation accelerator of the same type;
each type of data memory buffer unit is connected with a read enable OR gate, and the read enable OR gate is connected with a group of operation accelerators of the same type.
2. The circuitry for implementing heterogeneous multiprocessor operation allocation of claim 1, wherein the data poll arbiter is provided with a data request signal receiving interface and a data reply signal issuing interface;
the data memory buffer unit is provided with a data inlet, a data outlet, a non-empty signal output port, a write enable signal port and a read enable signal port;
the data request signal receiving interface is connected with the data receiving unit, and the data response signal sending interface is connected with the data receiving unit and the write enabling signal port;
the data inlet is connected with the data receiving unit;
the data outlet is connected with the same type of operation accelerator.
3. The circuit system for realizing the operation distribution of the heterogeneous multiprocessor according to claim 2, wherein the operation polling arbiter is provided with an enabling signal port, a plurality of operation request signal receiving interfaces and a plurality of operation response signal sending interfaces;
the enabling signal port is connected with a non-empty signal output port of the data memory buffer unit;
each operation request signal receiving interface is connected with a corresponding operation accelerator, and each operation response signal sending interface is connected with a corresponding operation accelerator and one input end of a reading enabling OR gate;
the output end of the read enable OR gate is connected with the read enable signal port of the data memory buffer unit.
4. The circuit system for implementing heterogeneous multiprocessor operation allocation according to claim 3, wherein the data transfer arbiter is provided with a plurality of transfer request signal receiving interfaces and a plurality of transfer reply signal issuing interfaces;
each transmission request signal receiving interface is connected with one operation accelerator of the same type, and each data transmission response signal sending interface is connected with the data transmission unit and one operation accelerator of the same type.
5. The circuitry for implementing heterogeneous multiprocessor operation allocation of claim 4, further comprising a multiplexer and a write enable or gate;
the number of the data receiving units is a plurality of;
the number of the data request signal receiving interfaces and the number of the data response signal sending interfaces of the data polling arbiter are the same as the number of the data receiving units;
each data receiving unit is connected with one input end of the multiplexer through a data line, and the output end of the multiplexer is connected with a data inlet of the data memory buffer unit;
the data response signal sending interface of each type of data polling arbiter is connected with one input end of a write enable or gate, and the output end of the write enable or gate is connected with the write enable signal of the data memory buffer unit.
6. A method for implementing heterogeneous multiprocessor operation allocation based on any of claims 1-5, comprising the steps of:
s1, judging the data type after the data receiving unit receives the data, requesting data buffering from a data polling arbiter of a corresponding type, and writing the data into a data memory buffer unit of the corresponding type after the data polling arbiter allows the data;
s2, after the data memory buffer unit detects that data is written in, starting an operation polling arbiter of the same type;
s3, the started operation polling arbiter selects one operation accelerator from the operation accelerators of the same type according to a polling mechanism and allows the selected operation accelerator to be started;
s4, the started operation accelerator acquires data from the data memory buffer units of the same type to perform operation, and sends a request to the data transmission arbiter of the same type after the operation is completed, and the data after the operation is completed is returned through the data transmission unit.
7. The method for implementing heterogeneous multiprocessor operation allocation according to claim 6, wherein step S1 comprises the specific steps of:
s11, adding a type field in advance for the network packet of the data;
s12, after the data receiving unit receives the data, judging the data type according to the type field of the data;
s13, the data receiving unit requests data buffering from the same type of data polling arbitrators according to the data type;
s14, the data polling arbiter returns a response signal to the data receiving unit and the data memory buffer units of the same type at the same time, so that the data memory buffer units are enabled;
s15, the data receiving unit starts data transmission, and writes the data into the enabled data memory buffer units of the same type.
8. The method for implementing heterogeneous multiprocessor operation allocation according to claim 7, wherein step S3 comprises the specific steps of:
s31, the started operation polling arbiter receives request signals of all the operation accelerators, and selects one operation accelerator with the same type according to a polling mechanism;
s32, the operation polling arbiter generates an operation response signal and returns the operation response signal to the selected operation accelerator and the data memory buffer units of the same type;
s33, starting an operation accelerator which receives the operation response signal;
s34, after the same type of operation polling arbiter sends operation response signals to any one type of operation accelerator, the data memory buffer unit outputs the buffered data according to the first-in first-out principle.
9. The method for implementing heterogeneous multiprocessor operation allocation according to claim 8, wherein step S4 comprises the specific steps of:
s41, the started operation accelerator receives the data output by the data memory buffer units of the same type and operates the data;
s42, after the data operation is completed, the started operation accelerator sends a request to the data transmission arbiters of the same type;
s43, the data transmission arbiter returns a transmission response signal to the same type of operation accelerator which correspondingly completes the operation and the data transmission unit;
s44, the operation accelerator which receives the transmission response signal transmits the data after operation to the data transmission unit;
s45, the data transmission unit receiving the transmission response signal transmits the received data back.
10. The method for implementing heterogeneous multiprocessor operation allocation of claim 6, wherein when the number of data receiving units is a plurality,
in step S14, the data polling arbiter simultaneously returns a response signal to the same type of data memory buffer unit through the write enable or gate;
in step S15, the data receiving unit writes the data into the enabled data memory buffer unit of the same type through the multiplexer.
CN202310125761.9A 2023-02-16 2023-02-16 Circuit system and method for realizing heterogeneous multiprocessor operation distribution Pending CN116009966A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310125761.9A CN116009966A (en) 2023-02-16 2023-02-16 Circuit system and method for realizing heterogeneous multiprocessor operation distribution

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310125761.9A CN116009966A (en) 2023-02-16 2023-02-16 Circuit system and method for realizing heterogeneous multiprocessor operation distribution

Publications (1)

Publication Number Publication Date
CN116009966A true CN116009966A (en) 2023-04-25

Family

ID=86037413

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310125761.9A Pending CN116009966A (en) 2023-02-16 2023-02-16 Circuit system and method for realizing heterogeneous multiprocessor operation distribution

Country Status (1)

Country Link
CN (1) CN116009966A (en)

Similar Documents

Publication Publication Date Title
CN112119376B (en) System with self-dispatch processor and hybrid thread organization
CN112106026B (en) Multithreaded self-scheduling processor for managing network congestion
CN103810133B (en) Method and apparatus for managing the access to sharing read buffer resource
CN110347635B (en) Heterogeneous multi-core microprocessor based on multilayer bus
CN102375800B (en) For the multiprocessor systems on chips of machine vision algorithm
US6868087B1 (en) Request queue manager in transfer controller with hub and ports
US20090271796A1 (en) Information processing system and task execution control method
US20080109569A1 (en) Remote DMA systems and methods for supporting synchronization of distributed processes in a multi-processor system using collective operations
US20080109573A1 (en) RDMA systems and methods for sending commands from a source node to a target node for local execution of commands at the target node
US8255591B2 (en) Method and system for managing cache injection in a multiprocessor system
CN112199173B (en) Data processing method for dual-core CPU real-time operating system
CN104102542A (en) Network data packet processing method and device
JP5360061B2 (en) Multiprocessor system and control method thereof
CN115203142A (en) Multi-core real-time communication system and method
WO2023045203A1 (en) Task scheduling method, chip, and electronic device
US6061757A (en) Handling interrupts by returning and requeuing currently executing interrupts for later resubmission when the currently executing interrupts are of lower priority than newly generated pending interrupts
US20080109604A1 (en) Systems and methods for remote direct memory access to processor caches for RDMA reads and writes
US20070073928A1 (en) High-speed input/output signaling mechanism using a polling CPU and cache coherency signaling
US8972693B2 (en) Hardware managed allocation and deallocation evaluation circuit
US20230063751A1 (en) A processor system and method for increasing data-transfer bandwidth during execution of a scheduled parallel process
US7958510B2 (en) Device, system and method of managing a resource request
CN116009966A (en) Circuit system and method for realizing heterogeneous multiprocessor operation distribution
US10713188B2 (en) Inter-process signaling system and method
Deri et al. Exploiting commodity multi-core systems for network traffic analysis
CN219642231U (en) Task distribution device and multi-core heterogeneous processor based on task distribution device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination