CN115481072A

CN115481072A - Inter-core data transmission method, multi-core chip and machine-readable storage medium

Info

Publication number: CN115481072A
Application number: CN202210918127.6A
Authority: CN
Inventors: 黎金旺; 李德建; 谭浪; 冯曦; 杨小坤; 刘畅; 徐立国; 刘滢浩
Original assignee: State Grid Corp of China SGCC; State Grid Jiangsu Electric Power Co Ltd; Beijing Smartchip Microelectronics Technology Co Ltd
Current assignee: State Grid Corp of China SGCC; State Grid Jiangsu Electric Power Co Ltd; Beijing Smartchip Microelectronics Technology Co Ltd
Priority date: 2022-08-01
Filing date: 2022-08-01
Publication date: 2022-12-16

Abstract

The embodiment of the invention provides an inter-core data transmission method, a multi-core chip and a machine-readable storage medium, and belongs to the technical field of communication. The method comprises the following steps: at the sending core end, sending a data transmission instruction to the receiving core, wherein the data transmission instruction comprises a data storage address of data to be transmitted; and at the receiving core end, decoding the data transmission instruction to obtain the data storage address, acquiring the data to be transmitted based on the data storage address, and storing the data to be transmitted into a private cache region of the receiving core. According to the embodiment of the invention, the data to be processed is cached in the private cache region of the receiving core through the data transmission instruction, so that when the receiving core processes the data task, the data can be directly hit in the private cache region without accessing the shared memory, the clock consumption is greatly reduced, and the performance of the processor is improved.

Description

Inter-core data transmission method, multi-core chip and machine-readable storage medium

Technical Field

The present invention relates to the field of communications technologies, and in particular, to an inter-core data transmission method, a multi-core chip, and a machine-readable storage medium.

Background

At present, a single-core processor is difficult to meet the requirements of users, so that a multi-core processor is produced. The multicore processor faces two types of inter-core communication: one is control state information between cores, and the other is data transmission communication between cores. For inter-core data transmission, there are two main implementation manners at present:

firstly, data is transmitted and processed in a flow process mode, for example, after the data processing task 0 is completed in the CPU0, the data is cached in a private cache area of the CPU0, and relevant data in a memory bank is updated; the CPU1 is taken out from the private cache region of the CPU0, so that the data processing task 1 is completed, and the CPU1 stores the data in the private cache region; taken out by the CPU2 to complete the data processing task 2, and so on. This method of fetching data by caching across the heterogeneous cores has a long latency. In addition, in such a pipeline form, data interaction between two cores often needs to be performed through shared coexistence, which brings about the following defect of the shared memory form.

And the other is realized purely based on a shared memory, for example, the CPU0 and the CPU1 respectively store data in the shared memory, and the data stored by the other side is obtained through the shared memory. However, the clock cycle consumed by the CPU to access the shared memory is much longer than the clock consumed by the CPU to access the private cache area, which results in poor access real-time performance.

Therefore, the current inter-core data transmission scheme greatly reduces the performance of the multi-core processor in the aspects of real-time performance, reliability, stability and the like.

Disclosure of Invention

An object of the embodiments of the present invention is to provide an inter-core data transmission method, a multi-core chip and a machine-readable storage medium, which are used to at least partially solve the above technical problems.

In order to achieve the above object, an embodiment of the present invention provides an inter-core data transmission method, which is applied to a multi-core chip having a sending core and a receiving core, and includes: at the sending core end, sending a data transmission instruction to the receiving core, wherein the data transmission instruction comprises a data storage address of data to be transmitted; and at the receiving core end, decoding the data transmission instruction to obtain the data storage address, acquiring the data to be transmitted based on the data storage address, and storing the data to be transmitted into a private cache region of the receiving core.

Optionally, the sending core and the receiving core are each configured with a state machine, the state machines are used for monitoring and managing data states in the corresponding cores, and the data transmission instruction further includes data states obtained through monitoring by the state machines.

Optionally, the inter-core data transmission method further includes: when the data to be transmitted is stored in a private cache region of the receiving core, updating the data state of the data to be transmitted from a private state to a shared state through the state machine; and/or for the situation that the sending core and the receiving core respectively cache data corresponding to the same task in the private cache region, if one of the sending core and the receiving core processes the data, modifying the data state of the corresponding data in the private cache region of the other one into an invalid state through the state machine.

Optionally, before the decoding the data transmission instruction, the inter-core data transmission method further includes: and when the receiving core receives the data transmission instruction, comparing the priority between the data transmission instruction and the current processing task of the receiving core, and interrupting the current processing task when the priority of the data transmission instruction is higher.

Optionally, the comparing the priority between the data transfer instruction and the current processing task of the receiving core includes: configuring a priority discriminator address used for pointing to a preset priority discriminator in the data transmission instruction; and acquiring the address of the priority discriminator from the data transmission instruction to trigger the priority discriminator to carry out priority comparison between the data transmission instruction and the current processing task of the receiving core.

Optionally, the interrupting the current processing task includes: sending a signal to a preset interrupt controller through a preset interrupt signal stack so that the interrupt controller generates an interrupt signal to interrupt the current processing task; and writing the execution data corresponding to the interrupted current processing task into the shared memory of the sending core and the receiving core through a preset field signal stack.

Optionally, the multi-core chip is configured to adopt an AMP structure, and in the AMP structure, one of the sending core and the receiving core is configured to run an operating system, and the other is configured to run a bare metal program.

An embodiment of the present invention further provides a multi-core chip, where the multi-core chip at least includes a sending core and a receiving core, and: the sending core is configured to send a data transmission instruction to the receiving core, wherein the data transmission instruction comprises a data storage address of data to be transmitted; and the receiving core is configured to decode the data transmission instruction to obtain the data storage address, acquire the data to be transmitted based on the data storage address, and store the data to be transmitted into a private cache region of the receiving core.

Optionally, the multi-core chip further includes: the storage part is used for storing data corresponding to the task to be executed, and the storage part comprises: the sending core and the receiving core are respectively provided with a first-level cache region and a second-level cache region; and a shared Memory area and a Dynamic Random Access Memory (DRAM) that the sending core and the receiving core have in common; an arbitration section for performing arbitration on an execution order between transmission of the data scheduled from the storage section and a current processing task of the corresponding core; and an execution section for executing data transmission or task interruption according to an arbitration result of the arbitration section.

Optionally, the storage section is further configured with: a state machine, and the state machine is configured to monitor and manage a data state of the data stored by the storage portion.

Optionally, the state machine manages the data state by: when the data to be transmitted is stored in a private cache region of the receiving core, the state machine updates the data state of the data to be transmitted from a private state to a shared state; and/or for the situation that the sending core and the receiving core respectively cache data corresponding to the same task in the private cache region, if one of the sending core and the receiving core processes the data, the state machine modifies the data state of the corresponding data in the private cache region of the other one into an invalid state.

Optionally, the first level cache region and/or the second level cache region are configured to support the transmission of the data transmission instruction; and/or establishing an instruction channel in the storage part corresponding to each of the sending core and the receiving core for supporting the transmission of the data transmission instruction.

Optionally, the arbitration section is configured with: and the scheduler is used for comparing the priority between the data transmission instruction and the current processing task of the receiving core when the receiving core receives the data transmission instruction, and interrupting the current processing task when the priority of the data transmission instruction is higher.

Optionally, the scheduler comprises: a priority arbiter that is pre-configured and that is configured with a corresponding priority arbiter address in the data transfer instruction for performing a priority comparison between the data transfer instruction and a current processing task of a receiving core; an interrupt signal stack for instructing the execution section to interrupt a current processing task of a receiving core when a priority of the data transfer instruction is higher than a priority of the current processing task; and the field signal stack is used for writing the execution data corresponding to the interrupted current processing task into the shared memory of the sending core and the receiving core.

Optionally, the execution part is configured with: and the interrupt controller is used for responding to the arbitration result of the arbitration part and executing task interrupt operation.

An embodiment of the present invention further provides a machine-readable storage medium, where the machine-readable storage medium has instructions stored thereon, and the instructions are configured to cause a machine to execute any inter-core data transmission method described above.

Through the technical scheme, the data required to be processed is cached in the private cache region of the receiving core through the data transmission instruction, so that when the receiving core processes the required data task, the data can be directly hit in the private cache region without accessing a shared memory, clock consumption is greatly reduced, and the performance of the processor is improved.

Additional features and advantages of embodiments of the invention will be set forth in the detailed description which follows.

Drawings

The accompanying drawings, which are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the embodiments of the invention and not to limit the embodiments of the invention. In the drawings:

FIG. 1 is a schematic diagram of an example dual-core processor in AMP mode;

FIG. 2 is a schematic diagram of IPC mechanism based inter-core data transfer based on AMP mode implementation;

FIG. 3 is a flow chart illustrating a method for inter-core data transmission according to an embodiment of the present invention;

fig. 4 is a schematic diagram of a principle of priority discrimination by a discriminator and a schematic diagram of a scheduler in an example in an embodiment of the present invention;

FIG. 5 is a flow chart of data transmission in an example of an embodiment of the present invention; and

FIG. 6 is a schematic structural diagram of a multi-core chip according to an embodiment of the present invention.

Description of the reference numerals

410. A priority discriminator; 420. an interrupt signal stack; 430. protecting and stacking on site; 610. a storage section; 620. an arbitration section; 630. and executing the part.

Detailed Description

The following detailed description of embodiments of the invention refers to the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating embodiments of the invention, are given by way of illustration and explanation only, not limitation.

Before describing the specific scheme of the embodiment of the present invention, the AMP structure and the Inter-Process Communication (IPC) implemented thereon according to the embodiment of the present invention are described, so that those skilled in the art can more clearly understand the embodiment of the present invention.

FIG. 1 is a schematic diagram of an example dual-core processor in AMP mode. As shown in fig. 1, in the AMP mode, one Core0 runs an operating system, and the other Core1 runs a bare-Core program, wherein each Core includes a private Cache, i.e., a first-Level Cache and a second-Level Cache, and the two cores are connected to a shared Memory (i.e., a Last-Level-Cache (LLC)) and a Dynamic Random Access Memory (DRAM). In addition, the dual-core processor supports an IPC mechanism.

It should be noted that, the embodiment of the present invention takes a dual-core processor as an example, but is not limited to dual cores.

Further taking the dual cores including the sending Core and the receiving Core as an example, fig. 2 is a schematic diagram of inter-Core data transmission based on an IPC mechanism implemented based on an AMP mode, where the dual Core processor includes a sending Core0 and a receiving Core1, and respective gray parts of the two cores in the diagram represent respective private buffers. As shown in fig. 2, data is cached in a private cache region of a Core, when the Core needs to perform a data processing task, the Core0 caches hit data through a related instruction, for example, after the data processing task is completed in the Core0, the Core0 notifies a physical memory (DRAM) and a shared memory address of the related data, updates a data state and stores the data state in an original data storage address of the private cache region, the Core1 caches the data in the private cache region of the Core1 to perform data reading and processing, and then the Core1 notifies the updated data state and stores.

It can be known that, in the inter-core data transmission scheme based on the IPC mechanism, since the data transmission form is based on a shared memory or a physical memory (hereinafter, the shared memory is mainly referred to as an example for accessing the shared memory), the receiving core needs to access the shared memory or the DRAM when reading data, and meanwhile, since the receiving core processes data cooperatively with the dual cores, a conflict occurs due to inconsistency of information states of data read by the two cores, which greatly increases the time delay of the processor and reduces the comprehensive performance of the processor.

Based on the AMP structure and the IPC mechanism described above, a new inter-core data transmission scheme is provided in the embodiments of the present invention, which will be described below.

Fig. 3 is a flowchart illustrating an inter-core data transmission method according to an embodiment of the present invention, which is applied to a multi-core chip having a sending core and a receiving core, where the multi-core chip includes, but is not limited to, a multi-core processor. Referring to fig. 3, the inter-core data transmission method may include the following steps S100 and S200:

step S100, at the sending core, sending a data transmission instruction to the receiving core, where the data transmission instruction includes a data storage address of data to be transmitted.

Step S200, at the receiving core end, decoding the data transmission instruction to obtain the data storage address, acquiring the data to be transmitted based on the data storage address, and storing the data to be transmitted into a private cache region of the receiving core.

For example, the data transmission instruction includes a data storage address (generally, a physical address of the data) of the data to be transmitted, so that the receiving core can store the data into its own private cache region based on the instruction of the data transmission instruction, and directly obtain the data from the private cache region when needed, without obtaining the data through a shared memory. Based on this, for example, when the Core1 executes processing required data, the hit data can be directly cached in the private cache region without accessing the shared memory, thereby greatly shortening the clock consumption caused by time delay, realizing high-speed data transmission between cores, and improving the comprehensive performances of the processor, such as real-time performance, reliability, stability and the like.

In correspondence with the above description about the AMP structure, in the embodiment of the present invention, the multi-core chip is configured to adopt the AMP structure, and in the AMP structure, one of the sending core and the receiving core is configured to run an operating system, and the other is configured to run a bare metal program.

Further, considering that there may be a task currently being processed by the receiving core when receiving the data transmission instruction, before decoding the data transmission instruction between the above step S100 and step S200, the inter-core data transmission method may further include:

step S300 (shown by a dashed line in the figure), when the receiving core receives the data transmission instruction, comparing the priority between the data transmission instruction and the current processing task of the receiving core, and when the priority of the data transmission instruction is higher, interrupting the current processing task.

For example, the data transfer instruction is prioritized over some private cache caching tasks of the receiving core.

For the "comparison" performed in step S300, the preferred implementation steps may include: configuring a priority discriminator address used for pointing to a preset priority discriminator (hereinafter referred to as a discriminator) in the data transmission instruction; and acquiring the address of the priority discriminator from the data transmission instruction to trigger the priority discriminator to perform priority comparison between the data transmission instruction and the current processing task of the receiving core.

For the "interruption" of the execution in this step S300, the preferred implementation steps may include: sending a signal to a preset interrupt controller through a preset interrupt signal stack so that the interrupt controller generates an interrupt signal to interrupt the current processing task; and writing the execution data corresponding to the interrupted current processing task into the shared memory of the sending core and the receiving core through a preset field signal stack.

The "comparison" and "interruption" realized in step S300 will be specifically described below by way of example.

Fig. 4 is a schematic diagram illustrating the principle of priority discrimination by the discriminator in the example of the embodiment of the present invention, and fig. 5 is a flow chart of data transmission in the example of the embodiment of the present invention. In this example, with reference to fig. 4 and 5, the sending Core0 sends a data transfer instruction to the receiving Core1, the instruction containing a data storage address, an arbiter address and a data state. Where the arbiter address points to the priority arbiter 410. In the priority discriminator 410, discriminating with the priority of the currently executed task of Core 1; when the data transfer instruction has a higher priority than the currently executing task, the Interrupt signal stack 420 sends a signal to a General Interrupt Controller (GIC); after the GIC generates the interrupt signal, it passes an interrupt signal to the receiving core and passes the field protection signal to the field protection stack 430. Accordingly, while the receiving core generates an interrupt response to interrupt the currently processed task, the priority arbiter 410 writes the currently processed task (running program, data, etc.) into a shared memory (e.g., register) for field protection via the field protection stack 430.

Further, as described above, the inter-core data transmission scheme under the IPC mechanism also has the problem that the states of the two cores reading data information are inconsistent, which results in conflict. In contrast, on the basis of the above steps S100 to S300, the inter-core data transmission method according to the embodiment of the present invention is further configured as follows.

Preferably, each of the sending core and the receiving core is configured with a state machine, the state machines are used for monitoring and managing data states in the corresponding cores, and the data transmission instruction further includes data states obtained by monitoring through the state machines. In this regard, the inter-core data transmission method may further include step S400 and/or step S500 (these two steps are not shown in the figure):

step S400, when the data to be transmitted is stored in the private buffer of the receiving core, updating the data state of the data to be transmitted from a private state to a shared state by the state machine.

Step S500, in a case that both the sending core and the receiving core respectively cache data corresponding to the same task in the private cache area, if one of the sending core and the receiving core processes the data, modifying the data state of the corresponding data in the private cache area of the other one to be an invalid state by the state machine.

For example, a state machine is set in the core by a compiler, and the state machine may include a data storage address, and further mark a state for data of the corresponding address. Based on this, referring to Core0 and Core1 shown in fig. 2 in step S400, when the system sends a cooperative processing data command to the two cores, core0 first caches data in a private cache region, core1 does not cache data, at this time, a state machine updates the current state of the data to be private, the data is consistent with a main memory (i.e., core 0), and an address points to a corresponding cache line (cache line) in the private cache region of Core 0; when the system wants two cores to send a cooperative data processing command, when Core0 reads data, the data is firstly cached in a private cache region, then when Core1 reads related data again, core0 can send a data transmission instruction to Core1, the instruction comprises a cache line address (namely a data storage address) of the data to be processed and a data state (at this time, the data is in a private state), the Core1 receives and decodes the instruction, the data is cached in the private cache region of Core1, and at this time, a state machine updates the current state of the data to be in a shared state.

For example, in step S500, when both the dual cores cache the required data in the private cache region, one of the cores performs a processing task on the data, for example, core0 processes the data, at this time, the information of the data cached in the Core0 and the information of the data cached in the Core1 private cache region are inconsistent, the data in Core1 is updated to an invalid state by the state machine, and the data is moved out of the private cache region and enters the elimination region.

The following further illustrates, by way of example, an inter-core data state snooping scheme implemented by using a state machine in the inter-core data transmission method according to the embodiment of the present invention.

Taking the processor as an example of an addition calculation, the system issues a command to both cores to assign a value of +1 to data a, and when the state machine is not used, core0 reads data information "a =1", and Core1 reads data information "a =1". At this time, after the two kernels execute the system command, the return values obtained by the memory are both 2, but in an ideal case, the return values obtained by the two kernels respectively performing +1 on a are 3. And under the condition of using the state machine, the kernel can monitor the bus due to the monitoring mechanism of the kernel, the state of the data information cached by the related cache lines in the private cache regions of the two kernels is shared, at the moment, when the Core0 adds data, the obtained new value is not only written back to the memory, and meanwhile, the Core1 can update the state of the data information, so that the data sharing of the private cache regions of the dual cores is achieved, the command is executed, and the return value obtained by the memory is 3. Therefore, the embodiment of the invention can reduce the operation amount of the processor and improve the operation performance of the processor by monitoring the data state between the middle core and the middle core.

Further, the multi-core chip configured in the embodiment of the present invention adopts an AMP structure, that is, one of the cores operates an Operating System (OS) as a master core, and the remaining cores operate bare computer programs as slave cores. The AMP structure with the cooperative processing of the main core and the auxiliary core can be downward compatible, high-efficiency communication is realized, the advantages of strong performance and high real-time performance of the multi-core processor can be exerted under certain specific scenes, and the defects of low real-time performance and high development difficulty of the multi-core processor by utilizing a conventional symmetrical structure (the cores run bare computer programs and the virtualization layer is used for supporting a multi-operating system) can be overcome.

Fig. 6 is a schematic structural diagram of a multi-core chip according to another embodiment of the present invention, which is the same as the inventive concept of the above-described embodiment regarding inter-core data transmission. The multi-core chip is based on the dual-core structure shown in fig. 1, and at least includes a sending core and a receiving core, and: the sending core is configured to send a data transmission instruction to the receiving core, wherein the data transmission instruction comprises a data storage address of data to be transmitted; and the receiving core is configured to decode the data transmission instruction to obtain the data storage address, acquire the data to be transmitted based on the data storage address, and store the data to be transmitted into a private cache region of the receiving core.

With further reference to fig. 6, the multi-core chip includes: a storage section 610 for storing data corresponding to a task to be executed; an arbitration section 620 for making an arbitration as to an execution order between the transmission of the data scheduled from the storage section 610 and the current processing task of the corresponding core; and an execution section 630 for performing data transmission or task interruption according to an arbitration result of the arbitration section.

For example, the memory portion 610 is similar to the architecture of fig. 1, and includes: the sending core and the receiving core are respectively provided with a first-level Cache region and a second-level Cache region (L2-Cache), wherein the first-level Cache region comprises a data Cache region (L1D-Cache) and an instruction Cache region (L1I-Cache); a shared memory area (L3-Cache, LLC) shared by the sending core and the receiving core; and a DRAM common to the transmitting core and the receiving core. Wherein the level one cache region and/or the level two cache region may be configured to support the transfer of the data transfer instruction; and/or establishing an instruction channel in the storage part corresponding to each of the sending core and the receiving core for supporting the transmission of the data transmission instruction.

Further, if the receiving core misses the cache when reading data from the current level of private cache (e.g., the first cache), the receiving core accesses the next level of cache (e.g., the second cache); if the cache is finally missed (i.e., a data miss occurs), the current data transfer instruction may be marked as invalid, requiring the sending core to resend the instruction. If the sending core has cache miss, if the receiving core has cached related data, the current data transmission instruction is invalid, and the sending core performs data transmission aiming at the next task; if the sending core has cache miss, if the receiving core does not cache the related data, the sending core continues to access the next level cache region of the sending core.

Preferably, the storage section 610 is further configured with: a state machine, and the state machine is configured to monitor and manage a data state of the data stored by the storage portion.

More preferably, the state machine manages the data state by: when the data to be transmitted is stored in a private cache region of the receiving core, the state machine updates the data state of the data to be transmitted from a private state to a shared state; and/or for the situation that both the sending core and the receiving core respectively cache data corresponding to the same task in the private cache region, if one of the sending core and the receiving core processes the data, the state machine modifies the data state of the corresponding data in the private cache region of the other one into an invalid state.

For specific application of the state machine, reference may be made to the foregoing embodiments related to the inter-core data transmission method, and details are not repeated here.

By way of further example, the arbitration section 620 is configured with: and the scheduler is used for comparing the priority between the data transmission instruction and the current processing task of the receiving core when the receiving core receives the data transmission instruction, and interrupting the current processing task when the priority of the data transmission instruction is higher. Wherein the arbitration part is, for example, a FABRIC arbitration mechanism (FABRIC-2 × 4) of a 2 × 4 interface, on which the scheduler can be configured.

Preferably, the scheduler structure may refer to fig. 4, including: a priority arbiter 410, which is pre-configured and configured with a corresponding priority arbiter address in the data transfer instruction, for performing a priority comparison between the data transfer instruction and a current processing task of a receiving core; an interrupt signal stack 420 for instructing the execution section to interrupt a current processing task of a receiving core when a priority of the data transfer instruction is higher than a priority of the current processing task; and a field signal stack 430, configured to write the execution data corresponding to the interrupted current processing task into the shared memory of the sending core and the receiving core.

By way of further example, the executing portion 630 is configured with: and the interrupt controller is used for responding to the arbitration result of the arbitration part and executing task interrupt operation. The interrupt controller is, for example, a GIC, and the FABRIC-2 × 4 mentioned above may be connected to the GIC through an interface.

It should be noted that the execution part may further include execution components such as a DDR (Double Data Rate) memory, a Main Fabric (Main mechanism frame), a CIU (Control Interface Unit), and the like according to requirements, and the arbitration part may connect the execution parts through corresponding interfaces.

Furthermore, the multi-core chip of the embodiment of the invention can adopt an AMP mode, can support IPC and can be suitable for executing pipeline tasks. For example, for the pipeline task, after Core0 finishes processing task 0 of data 0, the obtained new data is updated to original data 0 and transmitted to Core1 to execute task 1.

It should be noted that each core in the multi-core chip according to the embodiment of the present invention may be a sending core or a receiving core. In addition, the multi-core chip may be an independent chip structure, or may be integrated in a computing device for use. In addition, as described above, the multi-core chip is configured to adopt an AMP structure in which one of the transmitting core and the receiving core is configured to run an operating system and the other is configured to run a bare metal program.

When the multi-core chip of the embodiment of the invention processes data cooperatively, the data to be processed is cached in the private cache region of the receiving core through the data transmission instruction, so that when the receiving core processes the data task, the data can be directly hit in the private cache region without accessing the shared memory, the clock consumption is greatly reduced, and the real-time performance, the reliability, the stability and the like of the processor are greatly improved.

The embodiment of the present invention also provides a machine-readable storage medium, on which instructions are stored, where the instructions are used to enable a machine to execute the inter-core data transmission method described in the foregoing embodiment. The machine may be, for example, a computing device integrated with a dual-core processor, or the like.

An embodiment of the present invention further provides a computer program product, which is adapted to execute the method for initializing inter-core data transmission described in the above embodiment when executed on a data processing device.

The embodiment of the invention provides equipment, which comprises a processor, a memory and a program which is stored on the memory and can run on the processor, wherein the processor realizes the inter-core data transmission method in the embodiment when executing the program. The device of the embodiment of the invention is a computing device such as a server, a PC, a PAD, a mobile phone and the like.

The processor comprises at least two kernels, and corresponding program units are called from the kernels to the memory to perform inter-kernel data transmission.

The memory may include volatile memory in a computer readable medium, random Access Memory (RAM) and/or nonvolatile memory such as Read Only Memory (ROM) or flash memory (flash RAM), and the memory includes at least one memory chip.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and so forth) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). The memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Disks (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising a," "8230," "8230," or "comprising" does not exclude the presence of additional identical elements in the process, method, article, or apparatus comprising the element.

The above are merely examples of the present application and are not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims

1. An inter-core data transmission method is applied to a multi-core chip having a sending core and a receiving core, and the inter-core data transmission method includes:

at a sending core end, sending a data transmission instruction to a receiving core, wherein the data transmission instruction comprises a data storage address of data to be transmitted; and

at a receiving core end, decoding the data transmission instruction to obtain the data storage address, acquiring the data to be transmitted based on the data storage address, and storing the data to be transmitted into a private cache region of the receiving core;

the sending core and the receiving core are respectively configured with a state machine, the state machines are used for monitoring and managing data states in the corresponding cores, and the data transmission instruction further includes the data states monitored by the state machines.

2. The inter-core data transmission method according to claim 1, further comprising:

when the data to be transmitted is stored in a private cache region of the receiving core, updating the data state of the data to be transmitted from a private state to a shared state through the state machine; and/or

For the situation that the sending core and the receiving core respectively cache data corresponding to the same task in the private cache region, if one of the sending core and the receiving core processes the data, the state machine modifies the data state of the corresponding data in the private cache region of the other core into an invalid state.

3. The inter-core data transmission method according to claim 1, wherein before the decoding of the data transmission instruction, the inter-core data transmission method further comprises:

and when the receiving core receives the data transmission instruction, comparing the priority between the data transmission instruction and the current processing task of the receiving core, and interrupting the current processing task when the priority of the data transmission instruction is higher.

4. The method of claim 3, wherein comparing the priority between the data transfer instruction and the current processing task of the receiving core comprises:

configuring a priority discriminator address used for pointing to a preset priority discriminator in the data transmission instruction; and

and acquiring the address of the priority discriminator from the data transmission instruction to trigger the priority discriminator to perform priority comparison between the data transmission instruction and the current processing task of the receiving core.

5. The method of claim 3, wherein the interrupting the current processing task comprises:

sending a signal to a preset interrupt controller through a preset interrupt signal stack so that the interrupt controller generates an interrupt signal to interrupt the current processing task; and

and writing the execution data corresponding to the interrupted current processing task into the shared memory of the sending core and the receiving core through a preset field signal stack.

6. The method according to any one of claims 1 to 5, wherein the multicore chip is configured to employ an asymmetric multiprocessing AMP architecture, and in the AMP architecture, one of the transmitting core and the receiving core is configured to run an operating system, and the other is configured to run a bare-computer program.

7. A multi-core chip, wherein the multi-core chip includes at least a transmitting core and a receiving core, and wherein:

the sending core is configured to send a data transmission instruction to the receiving core, wherein the data transmission instruction comprises a data storage address of data to be transmitted; and

the receiving core is configured to decode the data transmission instruction to obtain the data storage address, acquire the data to be transmitted based on the data storage address, and store the data to be transmitted into a private cache region of the receiving core;

wherein the storage portion is further configured with a state machine configured to monitor and manage a data state of the data stored by the storage portion.

8. The multi-core chip of claim 7, further comprising:

the storage part is used for storing data corresponding to the task to be executed, and the storage part comprises: the sending core and the receiving core are respectively provided with a first-level cache region and a second-level cache region; the sending core and the receiving core share a shared memory area and a Dynamic Random Access Memory (DRAM);

an arbitration section for performing arbitration on an execution order between transmission of data scheduled from the storage section and a current processing task of the corresponding core; and

and the execution part is used for executing data transmission or task interruption according to the arbitration result of the arbitration part.

9. The multi-core chip of claim 8, wherein the state machine manages the data states by:

when the data to be transmitted is stored in a private cache region of the receiving core, the state machine updates the data state of the data to be transmitted from a private state to a shared state; and/or

For the situation that the sending core and the receiving core respectively cache data corresponding to the same task in the private cache region, if one of the sending core and the receiving core processes the data, the state machine modifies the data state of the corresponding data in the private cache region of the other one into an invalid state.

10. The multi-core chip of claim 8, wherein the level one and/or level two cache memory regions are configured to support transmission of the data transfer instruction; and/or establishing an instruction channel in the storage part corresponding to each of the sending core and the receiving core for supporting the transmission of the data transmission instruction.

11. The multi-core chip according to claim 8, wherein the arbitration section is configured with:

and the scheduler is used for comparing the priority between the data transmission instruction and the current processing task of the receiving core when the receiving core receives the data transmission instruction, and interrupting the current processing task when the priority of the data transmission instruction is higher.

12. The multi-core chip of claim 11, wherein the scheduler comprises:

a priority arbiter, which is pre-configured and configured with a corresponding priority arbiter address in the data transfer instruction, for performing a priority comparison between the data transfer instruction and a current processing task of a receiving core;

an interrupt signal stack for instructing the execution section to interrupt a current processing task of a receiving core when a priority of the data transfer instruction is higher than a priority of the current processing task; and

and the field signal stack is used for writing the execution data corresponding to the interrupted current processing task into the shared memory of the sending core and the receiving core.

13. The multi-core chip of claim 8, wherein the execution portion is configured with:

and the interrupt controller is used for responding to the arbitration result of the arbitration part and executing task interrupt operation.

14. The multi-core chip according to any one of claims 7 to 13, wherein the multi-core chip is configured to employ an asymmetric multiprocessing AMP architecture, and in the AMP architecture, one of the sending core and the receiving core is configured to run an operating system and the other is configured to run a bare-die program.

15. A machine-readable storage medium having stored thereon instructions for causing a machine to perform the inter-core data transfer method of any one of claims 1-6.