CN110888675A - Hardware system and electronic device - Google Patents

Hardware system and electronic device Download PDF

Info

Publication number
CN110888675A
CN110888675A CN201811056568.XA CN201811056568A CN110888675A CN 110888675 A CN110888675 A CN 110888675A CN 201811056568 A CN201811056568 A CN 201811056568A CN 110888675 A CN110888675 A CN 110888675A
Authority
CN
China
Prior art keywords
task
cache access
cache
access
subunit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811056568.XA
Other languages
Chinese (zh)
Other versions
CN110888675B (en
Inventor
李炜
曹庆新
黎立煌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Intellifusion Technologies Co Ltd
Original Assignee
Shenzhen Intellifusion Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Intellifusion Technologies Co Ltd filed Critical Shenzhen Intellifusion Technologies Co Ltd
Priority to CN201811056568.XA priority Critical patent/CN110888675B/en
Priority to PCT/CN2018/124854 priority patent/WO2020052171A1/en
Publication of CN110888675A publication Critical patent/CN110888675A/en
Application granted granted Critical
Publication of CN110888675B publication Critical patent/CN110888675B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/3004Arrangements for executing specific machine instructions to perform operations on memory
    • G06F9/30047Prefetch instructions; cache control instructions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

The embodiment of the application discloses a hardware system and electronic equipment. A hardware system comprising: the system comprises a central processing unit, a data buffer, a buffer access unit and a task manager. The central processor is connected with the task manager and the cache access unit; wherein the data buffer is connected with the task manager and the buffer access unit. The central processing unit is used for issuing at least one cache access task to the task manager and the cache access unit. The cache access unit is used for executing the at least one cache access task to access the data cache; the task manager is configured to monitor the cache access unit to execute the at least one cache access task based on the at least one cache access task. The scheme of the embodiment of the application is beneficial to improving the operation efficiency of the hardware system and reducing the interruption of the central processing unit.

Description

Hardware system and electronic device
Technical Field
The application relates to the technical field of computers, in particular to a hardware system and electronic equipment.
Background
Generally, the conventional scheme is to perform data exchange among a plurality of functional modules in a system, if any, by a shared data cache. In the conventional scheme, after one functional module writes data to be transferred into the shared data cache, the central processing unit needs to be interrupted, and then the central processing unit notifies the other functional module that the data is ready, so that the other functional module can read the data required by the other functional module from the shared data cache.
However, in practice, the conventional scheme requires continuous interruption of the cpu, so that the operation efficiency of the system is relatively low.
Disclosure of Invention
The embodiment of the application provides a hardware system and an electronic device, aiming at improving the operation efficiency of the hardware system and reducing the interruption of a central processing unit.
A first aspect of the present application provides a hardware system comprising:
a hardware system, comprising: the system comprises a central processing unit, a data buffer, a buffer access unit and a task manager;
the central processing unit is connected with the task manager and the cache access unit;
the data cache is connected with the task manager and the cache access unit;
the central processing unit is used for issuing at least one cache access task to the task manager and the cache access unit;
the cache access unit is used for executing the at least one cache access task to access the data cache;
the task manager is configured to monitor the cache access unit to execute the at least one cache access task based on the at least one cache access task.
In some possible embodiments, the at least one cache access task includes a cache access task T1 and a cache access task T0, and the cache access task T0 is a cache access task on which the cache access task T1 depends;
the task manager is specifically configured to, after receiving an access request q1 for the cache access task T1 from the cache access unit, determine whether to allow a response to the access request q1 based on a completion condition of the cache access task T0;
if the decision is made to allow the response to the access request q1, the task manager sends an access response aq1 for responding to the access request q1 to the cache access unit, and the access response aq1 is used for indicating that the cache access unit is allowed to execute the cache access task T1 on the data cache.
In some possible embodiments, when deciding whether to allow the response to the access request q1 based on the completion of the cache access task T0,
the task manager is specifically configured to, when the cache access task T0 is completed, determine to allow a response to the access request q 1; when the cache access task T0 is not completed, performing pointer comparison on a pointer of the cache requested by the cache access task T1 and a current pointer of the cache access task T0, and if the pointer comparison is passed, judging to allow the access request q1 to be responded; if the pointer comparison fails, then a decision is made not to allow response to the access request q 1.
In some possible embodiments, the data buffer includes a plurality of cache tiles, and the cache access task T1 includes the following fields:
a first field for indicating the slice identification of the accessed cache slice, a second field for indicating the start address of the accessed cache slice, a third field for indicating the length of the accessed data, a field for indicating the cache access task T0 on which the cache access task T1 depends.
In some possible embodiments, the access request q1 includes the following fields: the first field and a fourth field for indicating an element identification of a cache access element from which the access request q1 originated;
alternatively, the first and second electrodes may be,
the access request q1 includes the following fields: the second field, the third field, and the fourth field;
alternatively, the access request q1 includes the following fields: the second field, the third field, and a fifth field indicating a task identification of the cache access task T1.
In some possible embodiments, the cache access unit includes the following access sub-units: the external data reading subunit, the external data restoring subunit, the internal data reading subunit and the internal data restoring subunit;
the hardware system also comprises an external memory and an arithmetic unit;
the external data reading subunit and the external data storing subunit are connected between the external memory and the data buffer; the internal data reading subunit and the internal data storing subunit are connected between the arithmetic unit and the data buffer.
The external data reading subunit is used for storing the data read from the external memory into the data buffer;
the external data storing subunit is used for storing the data read from the data buffer into the external memory;
the internal data reading subunit is used for providing the data read from the data buffer to the arithmetic unit for operation;
and the internal data storage subunit is used for storing the result data obtained by the operation of the operation unit into the data buffer.
In some possible embodiments, the central processor is connected to the task manager, the external data reading subunit, the external data restoring subunit, the internal data reading subunit and the internal data restoring subunit through a bus;
the central processor is specifically configured to issue a cache access task of the external data reading subunit to the external data reading subunit through a bus; and issuing a cache access task of the external data restore subunit to the external data restore subunit; the cache access task of the internal data restoring subunit is issued to the internal data restoring subunit, and the cache access task of the external data reading restoring subunit is issued to the external data restoring subunit; and issuing the cache access tasks of the external data reading subunit, the external data restoring subunit, the internal data reading subunit and the internal data restoring subunit to the task manager.
In some possible embodiments, the task manager includes: a cache access controller and a cache access task queue;
the cache access task queue is used for storing cache access tasks of the external data reading subunit, the external data restoring subunit, the internal data restoring subunit and the internal data restoring subunit which are issued by the central processing unit;
wherein, in deciding whether to allow response to the access request q1 based on the completion of the cache access task T0 relied upon by the cache access task T1, the cache access controller in the task manager is configured to,
when the cache access task T0 depended by the cache access task T1 is completed, judging that the access request q1 is allowed to be responded;
when the cache access task T0 is not completed, performing pointer comparison on a pointer of the cache requested by the cache access task T1 and a current pointer of the cache access task T0, and if the pointer comparison is passed, judging to allow the access request q1 to be responded; if the pointer comparison fails, then a decision is made not to allow response to the access request q 1.
In some possible embodiments, the buffer access task queue includes a first sub-queue, a second sub-queue, a third sub-queue, and a fourth sub-queue:
the first sub-queue is used for storing a cache access task of the external data reading sub-unit issued by the central processing unit;
the second sub-queue is used for storing the cache access task of the external data storage sub-unit issued by the central processing unit;
the third sub-queue is used for storing the cache access task of the internal data reading sub-unit issued by the central processing unit;
the fourth sub-queue is used for storing the cache access task of the internal data storage sub-unit issued by the central processing unit;
wherein the cache access controller comprises: a first sub-controller, a second sub-controller, a third sub-controller, and a fourth sub-controller:
the first sub-controller is a cache access controller of the external data reading sub-unit; the second sub-controller is a cache access controller of the external data storage subunit; the third sub-controller is a cache access controller of the internal data reading sub-unit; and the fourth sub-controller is a cache access controller of the internal data storage sub-unit.
A second aspect of the present application provides an electronic device, comprising: the hardware system comprises a shell and a hardware system accommodated in the shell, wherein the hardware system is any one of the hardware systems provided by the first aspect.
It can be seen that, in the hardware system, a task manager, which is hardware different from the central processing unit, is introduced, the central processing unit is mainly configured to issue cache access tasks to the task manager and the cache access unit, and the task manager is configured to monitor execution of the at least one cache access task by the cache access unit based on the cache access task of the cache access unit. That is, the execution of the cache access task no longer needs the central processing unit to be controlled by the interrupt, but the task manager is responsible for the management of the data cache and the scheduling of the cache access unit, which is beneficial to reducing the interrupt of the central processing unit. Because the execution of the cache access task can be completed automatically and cooperatively by other hardware (namely, a task manager) except the central processing unit, namely, special hardware (namely, a task manager different from the central processing unit) except the central processing unit can automatically manage the data cache and the cache access unit, compared with the traditional technology that the central processing unit manages the data cache in a part-time mode (in the traditional technology, the central processing unit needs to manage the access of the data cache except for the main work of the central processing unit, the management work of the data cache can be regarded as the part-time work of the central processing unit), the response speed of the special hardware for managing the data cache is relatively higher, and the utilization rate of the data cache is further promoted; and because the interruption of the central processing unit is reduced, the central processing unit can be more concentrated on the jobs related to the system operation, which is beneficial to greatly improving the operation efficiency of the whole system.
In the cache access unit, more subunits can be subdivided based on different access mechanisms, for example, when the cache access unit includes subunits such as an external data reading subunit, an external data restoring subunit, an internal data reading subunit and an internal data restoring subunit, the task manager can coordinate data access work among the subunits, thereby facilitating the improvement of the access efficiency of the data cache and further the utilization rate of the data cache.
Drawings
The drawings used in the description of the embodiments are briefly described below.
FIG. 1-A is a diagram illustrating a hardware system architecture according to an embodiment of the present disclosure;
FIG. 1-B is a diagram illustrating another hardware system architecture according to an embodiment of the present application;
FIG. 1-C is a diagram illustrating another hardware system architecture according to an embodiment of the present application;
FIG. 1-D is a diagram illustrating another hardware system architecture according to an embodiment of the present disclosure;
fig. 2 is a schematic diagram illustrating a central processing unit issuing tasks to a task manager and each cache access unit through a task bus according to an embodiment of the present application;
fig. 3 is a schematic diagram of an internal architecture of a task manager according to an embodiment of the present application;
FIG. 4 is a schematic diagram of an internal architecture of another task manager according to an embodiment of the present application;
FIG. 5 is a flow chart illustrating a processing method of a hardware system according to an embodiment of the present disclosure;
6-A and 6-B are schematic diagrams of two data formats of a task provided by an embodiment of the present application;
FIG. 7-A is a diagram illustrating a situation where sub-queues in a task manager buffer tasks according to an embodiment of the present application;
FIG. 7-B is a diagram illustrating another example of buffering tasks in sub-queues of a task manager according to an embodiment of the present application;
fig. 8 is a schematic diagram of an execution association relationship of some tasks provided in an embodiment of the present application.
Detailed Description
The embodiment of the application provides a hardware system and an electronic device, aiming at improving the operation efficiency of the hardware system and reducing the interruption of a central processing unit.
In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only some embodiments of the present application, but not all embodiments. All other embodiments that can be derived from the embodiments given herein by a person of ordinary skill in the art are intended to be within the scope of the present disclosure.
The following are detailed below. In which the terms "first," "second," "third," "fourth," and "fifth," etc. in the description and claims of this application and in the above-described drawings are used for distinguishing between different elements and not for describing a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.
The embodiment of the application aims to design a hardware system which is beneficial to reducing the interruption of a central processing unit and improving the operation efficiency of the hardware system.
Referring to fig. 1-a, fig. 1-a is a schematic diagram of a hardware system architecture according to an embodiment of the present disclosure. As shown in fig. 1-a for example, the hardware system 100 mainly includes: a central processor 110, a data buffer 120, a buffer access unit 130, and a task manager 140.
The central processor 110 is connected to the task manager 140 and the cache access unit 130.
Wherein the data cache 120 is connected to the task manager 140 and the cache access unit 130.
The central processor 110 is configured to issue at least one cache access task (e.g., 1 cache access task or multiple cache access tasks) to the task manager 140 and the cache access unit 130.
The cache access unit 130 is configured to perform the at least one cache access task to access the data cache 120.
The task manager 140 is configured to monitor, based on at least one cache access task of the cache access unit 130, the execution of the at least one cache access task by the cache access unit 130.
It can be seen that, in the above-mentioned solution, a task manager is introduced into a hardware system, the central processing unit is mainly configured to issue a cache access task to the task manager and the cache access unit, and the task manager is configured to monitor, based on the cache access task of the cache access unit, that the cache access unit 130 executes the at least one cache access task. That is, the execution of the cache access task no longer needs the central processing unit to be controlled by the interrupt, but the task manager is responsible for the management of the data cache and the scheduling of the cache access unit, which is beneficial to reducing the interrupt of the central processing unit. Because the execution of the cache access task can be completed automatically and cooperatively by other hardware (namely, a task manager) except the central processing unit, namely, special hardware (namely, a task manager different from the central processing unit) except the central processing unit can automatically manage the data cache and the cache access unit, compared with the traditional technology that the central processing unit manages the data cache in a part-time mode (in the traditional technology, the central processing unit needs to manage the access of the data cache except for the main work of the central processing unit, and the management work of the data cache can be regarded as the part-time work of the central processing unit), the response speed of the special hardware for managing the data cache is relatively higher, so that the utilization rate of the data cache is more favorably improved; and because the interruption of the central processing unit is reduced, the central processing unit can be more concentrated on the jobs related to the system operation, which is beneficial to greatly improving the operation efficiency of the whole system.
In some possible embodiments, at least two components of the central processor 110, the data buffer 120, the buffer access unit 130, and the task manager 140 may be connected by a bus. Specifically, for example, the central processor 110, the data buffer 120, the buffer access unit 130, and the task manager 140 are connected by a bus.
Of course, hardware system 100 may include other components in addition to those illustrated in FIG. 1-A. For example, referring to fig. 1-B, fig. 1-B is a schematic diagram of another hardware system architecture provided by the embodiments of the present application. As shown in fig. 1-B for example, hardware system 100 may also include an arithmetic unit 150. As shown in fig. 1-B for example, cache access unit 130 may include the following access sub-units: an internal data reading sub-unit 133 and an internal data restoring sub-unit 134. The internal data reading sub-unit 133 and the internal data restoring sub-unit 133 are connected between the arithmetic unit 150 and the data buffer 120.
The internal data reading sub-unit 133 is configured to provide the data read from the data buffer 120 to the operation unit 150 for operation. The internal data restoring sub-unit 134 is configured to store the result data obtained by the operation of the operation unit 150 into the data buffer 120.
The arithmetic unit 150 may include at least one arithmetic subunit, and the at least one arithmetic subunit may include at least one convolution arithmetic subunit, at least one addition arithmetic subunit, at least one subtraction arithmetic subunit, and/or at least one other arithmetic subunit, for example.
Referring to fig. 1-C, fig. 1-C is a schematic diagram of another hardware system architecture provided by the embodiment of the present application. As shown in the example of fig. 1-C, hardware system 100 may also include external memory 160. As shown in fig. 1-C for example, cache access unit 130 may include the following access sub-units: an external data reading subunit 131 and an external data restoring subunit 132. The internal data reading subunit 131 and the internal data restoring subunit 132 are connected between the external memory 160 and the data buffer 120.
The external data reading subunit 131 is configured to store the data read from the external memory 160 into the data buffer 120. The external data restoring sub-unit 132 is used for storing the data read from the data buffer 120 into the external memory 160.
Referring to fig. 1-D, fig. 1-D is a schematic diagram of a hardware system architecture according to an embodiment of the present disclosure. As shown in fig. 1-D for example, hardware system 100 may also include both arithmetic unit 150 and external memory 160.
As shown in fig. 1-D for example, cache access unit 130 may include the following access sub-units: an external data reading sub-unit 131, an external data restoring sub-unit 132, an internal data reading sub-unit 133, and an internal data restoring sub-unit 134. The working mechanism of each access subunit in the cache access unit 130 is described in the above example.
More sub-units can be subdivided in the cache access unit based on different access mechanisms, for example, the cache access unit 130 includes sub-units such as an external data reading sub-unit 131, an external data restoring sub-unit 132, an internal data reading sub-unit 133, and an internal data restoring sub-unit 134, and then the task manager 140 can coordinate data access work among these sub-units, which is favorable for improving the access efficiency of the data cache and further improving the utilization rate of the data cache 120.
The external memory 160 is a memory relatively far from the computing unit 150, and the data buffer 120 is a memory relatively close to the computing unit 150. The external memory 120 may be, for example, an on-chip random access memory (ram), an off-chip ram, and specifically, a double data rate (double data rate) Synchronous Dynamic Random Access Memory (SDRAM) or other types of ram.
In some embodiments of the present application, the cache access unit may access the relevant data by using dma (direct memory access) technology, for example.
The external data reading subunit 131 can be, for example, an external input dma (external input direct memory access), and the external data reading subunit 131 can read data in the external memory 160 into the data buffer 120.
The external data restoring subunit 132 may be, for example, an eodma (external output dma), and the external data restoring subunit 132 may store the data in the data buffer 120 into the external memory 160.
The internal data reading subunit 141 may be, for example, an input dma (internal input direct memory access), and the internal data reading subunit 141 may read data in the data register 120 to the operation unit 150 for operation.
The internal data restoring subunit 142 may be, for example, an odma (output dma), and the internal data restoring subunit 141 may restore the operation result of the operation unit 150 to the data register 120.
The central processing unit 110 may include, for example, a Central Processing Unit (CPU) or other processors, such as a Digital Signal Processor (DSP), a microprocessor, a micro-CPU, a neural network calculator, or the like. In some specific applications, the components of the hardware system may be coupled together by a bus system, for example. The bus system may include a power bus, a control bus, a status signal bus, and the like, in addition to a data bus. The central processing unit 110 may be an integrated circuit chip having signal processing capability. In some implementations, the central processor 110 may include, in addition to units for executing software instructions, other hardware accelerators, such as may include application specific integrated circuits, field programmable gate arrays or other programmable logic devices, discrete gate or transistor logic, discrete hardware components, and so forth.
The central processing unit 110 is configured to issue a task through a task bus (task _ bus), for example. The task manager 140(task manager) performs management of the data cache 120 and coordinated synchronous management of the respective cache access units.
Referring to fig. 2, fig. 2 is a schematic diagram illustrating that the central processing unit 110 issues tasks to the task manager and each cache access unit through a task bus (task _ bus).
The following exemplifies some possible ways in which the task manager 140 cooperatively manages cache access units to access data caches.
In some possible embodiments, the task manager 140 may be configured to, after receiving an access request q1 for a cache access task T1 from a cache access unit, decide whether to allow a response to the access request q1 based on a completion status of a cache access task T0 relied on by the cache access task T1. If the decision is made to allow the response to the access request q1, sending an access response aq1 for responding to the access request q1 to the cache access unit, wherein the access response aq1 is used for indicating that the cache access unit is allowed to execute the access request q1 for the data cache.
For example, in the aspect of deciding whether to allow the response to the access request q1 based on the completion condition of the cache access task T0 depended on by the cache access task T1, the task manager 140 may be configured to, when the cache access task T1 depended on by the cache access task T0 has been completed, decide to allow the response to the access request q 1; when the cache access task T0 is not completed, a pointer of the cache requested by the cache access task T1 may be further compared with a current pointer of the cache access task T0, and if the pointer comparison is passed, it is determined that the access request q1 is allowed to be responded; if the pointer comparison fails, a decision may be made to disallow response to the access request q 1.
The current pointer of the cache access task T0 lags behind the pointer of the cache access task T1 requested to access the cache, and the pointer comparison is passed; when the current pointer of cache access task T0 does not lag behind the pointer requested to access the cache by cache access task T1, a pointer comparison fail is indicated.
The cache access task T1 depends on the cache access task T0, which means that the execution of the cache access task T1 must depend on the execution of the cache access task T0, that is, the execution of the cache access task T1 is based on the successful execution of the cache access task T0, that is, after the cache access task T0 is successfully executed, the cache access task T1 can be executed.
For example, assuming that the cache access task T1 is a read data buffer 120(data memory) and the cache access task T0 is a write data buffer 120, the data to be read from the data buffer 120 by the cache access task T1 may be part or all of the data written to the data buffer 120 by the cache access task T0, the corresponding data may not be read from the data buffer 120 by the cache access task T1 only after the cache access task T0 is successfully executed, and if the data to be read by the cache access task T1 does not exist in the data buffer 120 when the cache access task T0 is not successfully executed, the cache access task T1 may not be successfully executed, which indicates that the cache access task T1 depends on the cache access task T0.
The data buffer 120 may include a plurality of buffer slices (buffers).
In some possible embodiments, the cache access task T1 may include the following fields: a field for indicating a slice identification (buffer ID) of the accessed cache slice, a field for indicating a start address of the accessed cache slice, a field for indicating a length of the accessed data, a field for indicating a cache access task T0 on which the cache access task T1 depends, and the like. It can be seen that the above fields included in the cache access task are beneficial to clearly indicate the corresponding dependent cache access task, the accessed cache location and the accessed cache space length.
In some possible embodiments, the access request q1 may include the following fields: a field for indicating the starting address of the accessed cache slice, a field for indicating the location identification of the cache access unit from which the access request q1 originated;
alternatively, the access request q1 may include the following fields: a field for indicating the starting address of the accessed cache slice, a field for indicating the length of the accessed data, a field for indicating the location identification of the cache access unit from which the access request q1 originated;
alternatively, the access request q1 may include the following fields: a field for indicating the start address of the cache slice accessed, a field for indicating the length of the data accessed, and a field for indicating the task identification of the cache access task T1.
In some possible embodiments, the central processor 110 is connected to the task manager 140, the external data reading subunit 131, the external data restoring subunit 132, the internal data reading subunit 133 and the internal data restoring subunit 134 via a bus. Then, the central processing unit 110 may be specifically configured to issue a cache access task of the external data reading subunit 131 to the external data reading subunit 131 through a bus; and issues the cache access task of the external data restoring subunit 132 to the external data restoring subunit 132; and issues the cache access task of the internal data restoring subunit 133 to the internal data restoring subunit 133, and issues the cache access task of the external data reading restoring subunit 134 to the external data restoring subunit 134; and issues cache access tasks of the external data reading subunit 131, the external data restoring subunit 132, the internal data restoring subunit 133, and the internal data restoring subunit 134 to the task manager 140.
In some possible embodiments, referring to fig. 3, the task manager 140 may include: a cache access controller 142 and a cache access task queue 141. The cache access task queue is used for storing cache access tasks of the external data reading subunit, the external data restoring subunit, the internal data restoring subunit and the internal data restoring subunit, which are issued by the central processing unit.
In terms of deciding whether to allow the response to the access request q1 based on the completion of the cache access task T0 relied upon by the cache access task T1, the cache access controller 142 in the task manager 140 is operable to decide to allow the response to the access request q1 when the cache access task T1 relied upon by the cache access task T0 has been completed; when the cache access task T0 is not completed, performing pointer comparison on a pointer of the access request q1 requesting access to the cache and a current pointer of the cache access task T0, and if the pointer comparison is passed, judging to allow the response of the access request q 1; if the pointer comparison fails, then a decision is made not to allow response to the access request q 1.
Referring to fig. 4, fig. 4 illustrates a case where the task manager 140 includes a cache access task queue 141 and a cache access controller 142.
Among other things, the cache access task queues 141 include a first sub-queue (e.g., eidma task queue), a second sub-queue (e.g., eodma task queue), a third sub-queue (e.g., idma task queue), and a fourth sub-queue (e.g., odma task queue). The first sub-queue is configured to store a cache access task of the external data reading sub-unit issued by the central processing unit 110. The second sub-queue is configured to store the cache access task of the external data copy-back subunit issued by the central processing unit 110. The third sub-queue is configured to store the cache access task of the internal data reading sub-unit issued by the central processing unit 110. The fourth sub-queue is configured to store the cache access task of the internal data copy-back subunit issued by the central processing unit 110.
It can be understood that if sub-queue division is further performed inside the cache access task queue, so that different sub-queues store cache access tasks required to be executed by different cache access subunits, the cache access tasks are more conveniently classified and managed, the reading complexity of the cache access tasks is facilitated to be simplified, and the reading efficiency of the cache access tasks is further improved.
Wherein the cache access controller 142 comprises:
a first sub-controller (e.g., the eidma cache access controller), a second sub-controller (e.g., the eodma cache access controller), a third sub-controller (e.g., the idma cache access controller), and a fourth sub-controller (e.g., the odma cache access controller). The first sub-controller is a cache access controller of the external data reading sub-unit. The second sub-controller is a cache access controller of the external data storage subunit; the third sub-controller is a cache access controller of the internal data reading sub-unit; and the fourth sub-controller is a cache access controller of the internal data storage sub-unit.
It can be seen that if each cache access unit is configured with one corresponding cache access sub-controller, independent control is more favorably performed on the cache access units, and further, the cooperative control capability of the data access units is favorably further improved.
Some of the work flow of a hardware system is described below in conjunction with the figures.
Referring to FIG. 5, additional operational flows of hardware system 100 are described below in conjunction with FIG. 5. By way of example, the data access task cooperative execution process will describe how components such as the central processing unit and the task manager cooperate to complete data access of the data cache.
501. The central processor 110 issues tasks (tasks) to the task manager 140, eodma132, eidma131, idma133, and odma134 over the task bus.
Among them, the task accessing the data buffer 120 is stored in the buffer task queue corresponding to each task manager 140. In one embodiment, only tasks that operate on the data cache 120 and have dependencies with each other enter the buffer task queue in the task manager 140.
Wherein, a task in the buffer task queue can indicate the following information:
the buffer ID (slice identification of the cache slice) of the accessed buffer, the start address of the accessed buffer, the accessed data length and the task on which the current task depends.
In addition, if the task on which the current task depends is to read a data memory, the task may also indicate from which unit the input data of the current task comes (which may be referred to as a data source unit). If the task write on which the current task depends is a data memory, the task may also indicate to which unit the output data of the current task is destined (e.g., may be referred to as a data target unit).
Referring to fig. 6-a and 6-B, fig. 6-a and 6-B illustrate two data formats for a task. The task may include a buffer id field for indicating the buffer id of the accessed buffer, an address field for indicating the start address of the accessed buffer, a length field for indicating the length of the accessed data, a dependent task field for recording the task on which the current task depends, a data source unit field for recording the identification of the data source unit, or a data destination unit field for recording the data destination unit, etc.
Suppose that the central processing unit issues tasks through task _ bus as follows: eidma _ task 1: the data001 is read from the external memory into buf0 of the data buffer 120. idma _ task 1: the data001 in the buf0 is written by reading the eidma _ task1, and the read data001 is sent to the arithmetic unit to be operated. odma _ task 1: the operation result data002 obtained by the operation unit operating on the data001 is stored in buf1 of the data buffer 120. eodma _ task 1: the read odma task1 stores the operation result data002 in the buf1, and stores the read operation result data002 in the external memory.
Referring to FIG. 7-A, FIG. 7-A illustrates one case of sub-queues cached tasks in a task manager for storing cache access tasks for eodma, eidma, idma, and odma.
Wherein the eodma task queue is used for storing cache access tasks of eodma, and the eidma task queue is used for storing cache access tasks of eidma. The idma task queue is used for storing cache access tasks of the eodmas, and the odmatask queue is used for storing cache access tasks of the eodmas.
502. The eidma sends a memory request (task execution request) for the task eidma _ task1 to the task manager.
503. The task manager receives the memory request for task eidma _ task1, and the eidma cache access controller in the task manager reads task eidma _ task1 in the eidma task request and decides whether to respond to the memory request for task eidma _ task 1.
Wherein a memory request for the task eidma _ task1 may be responded to if the task on which the task eidma _ task1 depends has completed. If the task on which the task eidma _ task1 depends has not yet completed, then the memory request for task eidma _ task1 is not responded to. If the current request cannot respond, the data memory controller can carry out back pressure on the corresponding unit which initiates the request. The backpressure means to notify the corresponding unit initiating the request to extend the tolerance time for waiting for the corresponding response, for example, the tolerance time for normally waiting for the corresponding response is 2 seconds, and the backpressure may notify the corresponding unit initiating the request to extend the tolerance time for waiting for the corresponding response to 5 seconds or other time duration not less than two seconds.
Here, take the example that the task on which the task eidma _ task1 depends has completed. Then the eidma cache access controller in the task manager may notify the data memory controller to respond to the memory request of task eidma _ task1, specifically via the xxx _ buf _ rdy instruction.
504. The eidma131 reads the data to be processed data001 from the external memory 120 to the buf0 of the data buffer 120 as directed by the eidma _ task 1.
505. The task manager 140 notifies (e.g., xxx _ buf _ rdy command) the idma to read the data001 from the data buffer 120.
506. Upon receiving the notification from the task manager 140, the idma133 reads the data001 to be used for calculation to the arithmetic unit 150. Accordingly, the arithmetic unit 150 performs an operation on the data001 and obtains an operation result data 002.
507. odma134 stores operation result data002 obtained by operation unit 150 operating on data001 back to buf1 of data buffer 120.
508. The task manager 140 may notify the eodma132 to read the data002 from the buf1 of the data cache 120 and then store it back to the external memory 160.
509. When the task manager 140 notifies the eodma132 to read the data002 from the data cache 120 and then store it back to the external memory 160, the eodma132 reads the data002 from the data cache 120 and then stores it back to the external memory 160.
It can be seen that in the example flow described above, the central processor dispatches tasks to eidma, idma, odma, eodma, and task manager via the task bus (task _ bus). eidma, idma, odma, and eodma would then work according to the issued task. The task manager dynamically monitors each cache access unit (eidma, idma, odma and eodma), and performs cooperative management of task execution of each access unit according to the issued task.
In the above example, the example is described by taking only one task for each unit in eidma, idma, odma, eodma, and of course, the central processing unit may issue more tasks to them. For example, the tasks given by the central processing unit to each unit via task _ bus may also be as follows:
eidma _ task 1: the initial data001 stored in the external memory is read into the buf0 of the data memory.
idma _ task 1: reading the data001 written in the buf0 by the eidma task1, and sending the read data001 to the process element for calculation.
odma _ task 1: the result data002 obtained by calculating the data001 by the process element is stored in the buf1 of the data memory.
idma _ task 2: the read odma task1 stores the data002 in the buf1, and sends the read data002 to the process element for calculation.
odma _ task 2: the result data003 obtained by calculating the data002 by the process element is stored in the buf0 of the data memory.
idma _ task 3: the read odma task2 stores the data003 in the buf0, and sends the read data003 to the process element for calculation.
odma _ task 3: the result data004 obtained by calculating the data003 by the process element is stored in the buf1 of the data memory.
eodma _ task 1: the read odma task3 is stored into the data004 in the buf1, and the read data004 is stored into the external memory.
Referring to FIG. 7-B, FIG. 7-B illustrates another scenario of cached tasks in sub-queues of a task management for storing cache access tasks for eodma, eidma, idma, and odma.
Referring to fig. 8, the execution association of these tasks is illustrated in fig. 8 by way of example. The dashed arrows indicate the execution order of tasks.
In the example shown in FIG. 8, the eidma _ task1 is executed first, the eidma _ task1 indicates that data001 is written to buf0, and the idma _ task1, which is dependent on the eidma _ task1, is executed after the eidma _ task1, which indicates that data001 is read from buf0 for the compute unit to compute.
odma _ task1 executes after idma _ task1, and odma _ task1 instructs to write data002 resulting from calculating data001 into buf 1; the idma _ task2, which depends on the odma _ task1, executes after the odma _ task1, which indicates to read data002 from the buf1 for the calculation unit to calculate.
The odma _ task2 is executed after the idma _ task2, the odma _ task2 instructs to write data003 resulting from calculating data002 into the buf0, and the idma _ task3 dependent on the odma _ task2 is executed after the odma _ task2, which instructs to read data002 from the buf1 to the calculation unit for calculation.
The odma _ task3 is executed after the idma _ task3, the odma _ task3 instructs to write data004 resulting from calculating data003 into the buf1, and the eodma _ task1 dependent on the odma _ task3 is executed after the odma _ task3, which instructs to read the data004 from the buf1 and store the read data004 into the external memory.
It is understood that the tasks issued by the central processing unit 110 can be varied and are not limited to the examples described above. For the execution of other tasks, reference may be made to the exemplary execution process in the foregoing embodiment, which is not described herein again.
The above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present application.

Claims (10)

1. A hardware system, comprising: the system comprises a central processing unit, a data buffer, a buffer access unit and a task manager;
the central processing unit is connected with the task manager and the cache access unit;
the data cache is connected with the task manager and the cache access unit;
the central processing unit is used for issuing at least one cache access task to the task manager and the cache access unit;
the cache access unit is used for executing the at least one cache access task to access the data cache;
the task manager is configured to monitor the cache access unit to execute the at least one cache access task based on the at least one cache access task.
2. The system of claim 1, wherein the at least one cache access task comprises a cache access task T1 and a cache access task T0, and wherein the cache access task T0 is a cache access task on which the cache access task T1 depends;
the task manager is specifically configured to, after receiving an access request q1 for the cache access task T1 from the cache access unit, determine whether to allow a response to the access request q1 based on a completion condition of the cache access task T0;
if the decision is made to allow the response to the access request q1, the task manager sends an access response aq1 for responding to the access request q1 to the cache access unit, and the access response aq1 is used for indicating that the cache access unit is allowed to execute the cache access task T1 on the data cache.
3. The system of claim 2, wherein when determining whether to allow response to the access request q1 based on the completion of the cache access task T0,
the task manager is specifically configured to, when the cache access task T0 is completed, determine to allow a response to the access request q 1; when the cache access task T0 is not completed, performing pointer comparison on a pointer of the cache requested by the cache access task T1 and a current pointer of the cache access task T0, and if the pointer comparison is passed, judging to allow the access request q1 to be responded; if the pointer comparison fails, then a decision is made not to allow response to the access request q 1.
4. The system of claim 3, wherein the data cache comprises a plurality of cache tiles, and wherein the cache access task T1 comprises the following fields:
a first field for indicating the slice identification of the accessed cache slice, a second field for indicating the start address of the accessed cache slice, a third field for indicating the length of the accessed data, a field for indicating the cache access task T0 on which the cache access task T1 depends.
5. System according to claim 3 or 4, characterized in that said access request q1 comprises the following fields: the first field and a fourth field for indicating an element identification of a cache access element from which the access request q1 originated;
alternatively, the first and second electrodes may be,
the access request q1 includes the following fields: the second field, the third field, and the fourth field;
alternatively, the access request q1 includes the following fields: the second field, the third field, and a fifth field indicating a task identification of the cache access task T1.
6. The system of any of claims 1 to 5, wherein the cache access unit comprises the following access sub-units: the external data reading subunit, the external data restoring subunit, the internal data reading subunit and the internal data restoring subunit;
the hardware system also comprises an external memory and an arithmetic unit;
the external data reading subunit and the external data storing subunit are connected between the external memory and the data buffer; the internal data reading subunit and the internal data restoring subunit are connected between the arithmetic unit and the data buffer;
the external data reading subunit is used for storing the data read from the external memory into the data buffer;
the external data storing subunit is used for storing the data read from the data buffer into the external memory;
the internal data reading subunit is used for providing the data read from the data buffer to the arithmetic unit for operation;
and the internal data storage subunit is used for storing the result data obtained by the operation of the operation unit into the data buffer.
7. The system of claim 6, wherein said central processor is connected to said task manager, said external data reading subunit, said external data restoring subunit, said internal data reading subunit and said internal data restoring subunit via a bus;
the central processor is specifically configured to issue a cache access task of the external data reading subunit to the external data reading subunit through a bus; and issuing a cache access task of the external data restore subunit to the external data restore subunit; the cache access task of the internal data restoring subunit is issued to the internal data restoring subunit, and the cache access task of the external data reading restoring subunit is issued to the external data restoring subunit; and issuing the cache access tasks of the external data reading subunit, the external data restoring subunit, the internal data reading subunit and the internal data restoring subunit to the task manager.
8. The system of claim 5 or 6 or 7,
the task manager includes: a cache access controller and a cache access task queue;
the cache access task queue is used for storing cache access tasks of the external data reading subunit, the external data restoring subunit, the internal data restoring subunit and the internal data restoring subunit which are issued by the central processing unit;
wherein, in deciding whether to allow response to the access request q1 based on the completion of the cache access task T0 relied upon by the cache access task T1, the cache access controller is configured to,
when the cache access task T0 depended by the cache access task T1 is completed, judging that the access request q1 is allowed to be responded;
when the cache access task T0 is not completed, performing pointer comparison on a pointer of the cache requested by the cache access task T1 and a current pointer of the cache access task T0, and if the pointer comparison is passed, judging to allow the access request q1 to be responded; if the pointer comparison fails, then a decision is made not to allow response to the access request q 1.
9. The system of claim 8, wherein the buffer access task queue comprises a first sub-queue, a second sub-queue, a third sub-queue, and a fourth sub-queue:
the first sub-queue is used for storing a cache access task of the external data reading sub-unit issued by the central processing unit;
the second sub-queue is used for storing the cache access task of the external data storage sub-unit issued by the central processing unit;
the third sub-queue is used for storing the cache access task of the internal data reading sub-unit issued by the central processing unit;
the fourth sub-queue is used for storing the cache access task of the internal data storage sub-unit issued by the central processing unit;
wherein the cache access controller comprises: a first sub-controller, a second sub-controller, a third sub-controller, and a fourth sub-controller:
the first sub-controller is a cache access controller of the external data reading sub-unit; the second sub-controller is a cache access controller of the external data storage subunit; the third sub-controller is a cache access controller of the internal data reading sub-unit; and the fourth sub-controller is a cache access controller of the internal data storage sub-unit.
10. An electronic device, comprising: a housing and a hardware system housed within the housing, the hardware system being as claimed in any one of claims 1 to 9.
CN201811056568.XA 2018-09-11 2018-09-11 Hardware system and electronic device Active CN110888675B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201811056568.XA CN110888675B (en) 2018-09-11 2018-09-11 Hardware system and electronic device
PCT/CN2018/124854 WO2020052171A1 (en) 2018-09-11 2018-12-28 Hardware system and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811056568.XA CN110888675B (en) 2018-09-11 2018-09-11 Hardware system and electronic device

Publications (2)

Publication Number Publication Date
CN110888675A true CN110888675A (en) 2020-03-17
CN110888675B CN110888675B (en) 2021-04-06

Family

ID=69745493

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811056568.XA Active CN110888675B (en) 2018-09-11 2018-09-11 Hardware system and electronic device

Country Status (2)

Country Link
CN (1) CN110888675B (en)
WO (1) WO2020052171A1 (en)

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6356963B1 (en) * 1996-07-19 2002-03-12 Compaq Computer Corporation Long latency interrupt handling and input/output write posting
CN1473300A (en) * 2000-09-29 2004-02-04 Intelligent networks storage interface system and devices
CN1794213A (en) * 2004-12-24 2006-06-28 富士通株式会社 Direct memory access circuit and disk array device using same
CN102099800A (en) * 2008-03-26 2011-06-15 高通股份有限公司 Off-line task list architecture
US20120254210A1 (en) * 2011-03-28 2012-10-04 Siva Kiran Dhulipala Systems and methods of utf-8 pattern matching
CN102929714A (en) * 2012-10-19 2013-02-13 国电南京自动化股份有限公司 uC/OS-II-based hardware task manager
CN103207839A (en) * 2012-01-17 2013-07-17 国际商业机器公司 Method and system for cache management of track removal in cache for storage
CN103902364A (en) * 2012-12-25 2014-07-02 腾讯科技(深圳)有限公司 Physical resource management method and device and intelligent terminal
CN105677455A (en) * 2014-11-21 2016-06-15 深圳市中兴微电子技术有限公司 Device scheduling method and task administrator
US20170199818A1 (en) * 2013-06-22 2017-07-13 Microsoft Technology Licensing, Llc Log-structured storage for data access
CN108140009A (en) * 2015-10-13 2018-06-08 微软技术许可有限责任公司 B-tree key assignments manager of the distributed freedom formula based on RDMA
CN108241509A (en) * 2016-12-27 2018-07-03 英特尔公司 For efficiently handling the method and apparatus of the distribution of memory order buffer

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101997777B (en) * 2010-11-16 2012-10-10 福建星网锐捷网络有限公司 Interruption processing method, device and network equipment
CN102073481B (en) * 2011-01-14 2013-07-03 上海交通大学 Multi-kernel DSP reconfigurable special integrated circuit system
KR20160061726A (en) * 2014-11-24 2016-06-01 삼성전자주식회사 Method for handling interrupts
CN105511964B (en) * 2015-11-30 2019-03-19 华为技术有限公司 The treating method and apparatus of I/O request

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6356963B1 (en) * 1996-07-19 2002-03-12 Compaq Computer Corporation Long latency interrupt handling and input/output write posting
CN1473300A (en) * 2000-09-29 2004-02-04 Intelligent networks storage interface system and devices
CN1794213A (en) * 2004-12-24 2006-06-28 富士通株式会社 Direct memory access circuit and disk array device using same
CN102099800A (en) * 2008-03-26 2011-06-15 高通股份有限公司 Off-line task list architecture
US20120254210A1 (en) * 2011-03-28 2012-10-04 Siva Kiran Dhulipala Systems and methods of utf-8 pattern matching
CN103207839A (en) * 2012-01-17 2013-07-17 国际商业机器公司 Method and system for cache management of track removal in cache for storage
CN102929714A (en) * 2012-10-19 2013-02-13 国电南京自动化股份有限公司 uC/OS-II-based hardware task manager
CN103902364A (en) * 2012-12-25 2014-07-02 腾讯科技(深圳)有限公司 Physical resource management method and device and intelligent terminal
US20170199818A1 (en) * 2013-06-22 2017-07-13 Microsoft Technology Licensing, Llc Log-structured storage for data access
CN105677455A (en) * 2014-11-21 2016-06-15 深圳市中兴微电子技术有限公司 Device scheduling method and task administrator
CN108140009A (en) * 2015-10-13 2018-06-08 微软技术许可有限责任公司 B-tree key assignments manager of the distributed freedom formula based on RDMA
CN108241509A (en) * 2016-12-27 2018-07-03 英特尔公司 For efficiently handling the method and apparatus of the distribution of memory order buffer

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
姚崎等: "面向多核处理器的Linux网络报文缓冲区重用机制研究", 《通信学报》 *
苏文等: "基于Cache锁和直接缓存访问的网络处理优化方法", 《计算机研究与发展》 *

Also Published As

Publication number Publication date
CN110888675B (en) 2021-04-06
WO2020052171A1 (en) 2020-03-19

Similar Documents

Publication Publication Date Title
US7117285B2 (en) Method and system for efficiently directing interrupts
US9606838B2 (en) Dynamically configurable hardware queues for dispatching jobs to a plurality of hardware acceleration engines
US11093297B2 (en) Workload optimization system
KR100977662B1 (en) Two-level interrupt service routine
US10552213B2 (en) Thread pool and task queuing method and system
EP1770520A2 (en) Operating cell processors over a network
EP3489815B1 (en) Method and system for low latency data management
US20110219373A1 (en) Virtual machine management apparatus and virtualization method for virtualization-supporting terminal platform
US20120297216A1 (en) Dynamically selecting active polling or timed waits
US10614004B2 (en) Memory transaction prioritization
US20160034332A1 (en) Information processing system and method
US20110047550A1 (en) Software program execution device, software program execution method, and program
US9286129B2 (en) Termination of requests in a distributed coprocessor system
US20110055831A1 (en) Program execution with improved power efficiency
CN110888675B (en) Hardware system and electronic device
US8719499B2 (en) Cache-line based notification
CN113076189B (en) Data processing system with multiple data paths and virtual electronic device constructed using multiple data paths
CN113076180B (en) Method for constructing uplink data path and data processing system
US20150149714A1 (en) Constraining prefetch requests to a processor socket
EP3853724B1 (en) I/o completion polling for low latency storage device
CN108958905B (en) Lightweight operating system of embedded multi-core central processing unit
CN108958904B (en) Driver framework of lightweight operating system of embedded multi-core central processing unit
CN110968418A (en) Signal-slot-based large-scale constrained concurrent task scheduling method and device
US20240160364A1 (en) Allocation of resources when processing at memory level through memory request scheduling
US20240004808A1 (en) Optimized prioritization of memory accesses

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant