CN113806250A

CN113806250A - Method for coordinating general processor core and vector component, interface and processor

Info

Publication number: CN113806250A
Application number: CN202111123314.7A
Authority: CN
Inventors: 郭维; 邓全; 雷国庆; 郭辉; 王俊辉; 郑重; 黄立波; 隋兵才; 倪晓强; 孙彩霞; 王永文
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Priority date: 2021-09-24
Filing date: 2021-09-24
Publication date: 2021-12-17
Anticipated expiration: 2041-09-24
Also published as: CN113806250B

Abstract

The invention discloses a cooperation method of a general processor core and a vector unit, an interface and a processor, wherein the method comprises the steps that the vector unit receives a vector request and judges the condition of the vector request, if cache data needs to be accessed, a memory access request is sent to a processor data cache unit of the general processor core, if register access is needed, a register request is sent to a processor register unit of the general processor core, if floating point calculation is needed, a floating point calculation request is sent to a processor floating point calculation unit of the general processor core, and if page table conversion is needed, a page table conversion request is sent to a processor page table conversion unit of the general processor core. The invention aims at the core and the vector component of the general processor, can realize the direct interaction with the related components such as a control component, a register component, a data cache component, a floating point calculation component, page table conversion and the like by using a high-efficiency loosely-coupled special interface, and improves the data reading and writing efficiency and the instruction execution efficiency of the vector component.

Description

Method for coordinating general processor core and vector component, interface and processor

Technical Field

The invention relates to a design technology of a cooperation interface of a general processor core and a vector component, in particular to a cooperation method of the general processor core and the vector component, an interface and a processor.

Background

With the continuous development of computer technology, the demand for different calculations is continuously increased, and due to design goals and other reasons, the general-purpose processor is generally low in application efficiency in the aspects of vector calculation or neural network calculation, and cannot provide the calculation speed or data scale required by the vector calculation or the neural network calculation. At the same time, there is still a substantial amount of general-purpose computing operations in the computing requirements of vector computing or neural network computing, and vector components typically provide no general-purpose computing power, or only very little general-purpose computing power. Solutions have thus emerged in which a general purpose processor core cooperates with a dedicated graphics accelerator (e.g., CPU-GPU), but different vector component vendors typically only provide implementations for programming their devices. For heterogeneous systems, it is generally difficult to implement mechanism programming using programming languages of the same style, and it is also very difficult to process different devices as unified computing units, and taking GPU acceleration as an example, two major vendors NVIDIA and AMD develop and maintain independent GPU programming components CUDA SDK and AMD APP, respectively (ATI STREAM). Therefore, such explicit heterogeneous solution has high requirements for programming and heterogeneous coordination, especially for memory management, requires programmers to explicitly declare and explicitly move between main memory and the device memory, and has great difficulty in practical applications.

To provide vector computation oriented computing power in a general purpose processor, the use of separate vector components is an efficient solution. And reserving a special instruction in the instruction set design, reserving a special interface in the general processor core, and interacting the vector component and the general processor component by adopting an interface protocol. The scheme completely reserves the general processing capacity of the general processor, and can allocate the accelerated calculation part in the vector calculation to the vector component, thereby realizing the simultaneous maximization of the operation efficiency of the vector calculation and the general calculation. However, in the existing implementation design scheme, the interaction of control and data signals is too frequent, and the time sequence of a production line is tense; the specific resources in the vector component are insufficient, and the general processor resources are idle and wasted.

Disclosure of Invention

The technical problems to be solved by the invention are as follows: aiming at the problems in the prior art, the invention provides a cooperation method of a general processor core and a vector unit, an interface and a processor, and can realize the direct interaction with related units such as a control unit, a register unit, a data cache unit, a floating point calculation unit, page table conversion and the like by using an efficient loosely-coupled special interface aiming at the general processor core and the vector unit, thereby improving the data reading and writing efficiency and the instruction execution efficiency of the vector unit.

In order to solve the technical problems, the invention adopts the technical scheme that:

a method of coordinating a general purpose processor core with a vector component, comprising:

1) the vector unit receives a vector request sent by the general-purpose processor core;

2) the vector component judges whether the vector request needs to access the cache data, if the cache data needs to be accessed, the vector component sends a memory access request to a processor data cache component of the general processor core, so that the processor data cache component processes the memory access request and returns the data or the state of the memory access request; the vector unit judges whether the vector request needs register access, if so, the vector unit sends the register request to a processor register unit of the general processor core, so that the processor register unit processes the register request and returns the data or the state of the register request; the vector unit judges whether the vector request needs floating point calculation, and if the vector request needs floating point calculation, the vector unit sends a floating point calculation request to the processor floating point calculation unit of the general processor core, so that the processor floating point calculation unit processes the floating point calculation request and returns a floating point calculation result; the vector component judges whether the vector request needs page table translation, if the vector request needs page table translation, the vector component sends a page table translation request to a processor page table translation component of the general processor core, so that the processor page table translation component processes the page table translation request and returns a page table translation result;

3) judging whether the operation is finished or not, and if the operation is not finished, skipping to execute the step 2); otherwise the vector unit returns the final vector computation result to the general purpose processor core.

Optionally, step 1) comprises: 1.1) the vector unit receives a vector request sent by a general processor core through a control path, the received vector request comprises a vector calculation request type, a vector calculation request number, a vector calculation request source operand A, a vector calculation request source operand B, a vector calculation request source operand C, an operation type sent by a vector calculation request result register number port, a request number, a source operand A, B, C and a result register number, and a signal sent by the fracture is identified to be valid by a vector calculation request valid bit; 1.2) the vector unit informs the processor control unit of the general-purpose processor core through the vector request receiving feedback bit port of the control path after completing the receiving and saving of the vector request.

Optionally, the sending the access request to the processor data cache component of the general-purpose processor core in step 2) includes: 2.1A) aiming at cache data needing to be accessed, the access operation type, the access request number, the access request physical address and the access request write data which need to be obtained are respectively sent to a processor cache data component through the access request type, the access request number, the access request physical address and the access request write data port of an access way, and the signals are identified to be valid by an access request valid bit of a vector component; 2.2A) after finishing receiving and storing the access request, the processor cache data component receives the feedback bit through the access request of the vector component to inform the vector component; 2.3A) if the access operation type is reading data, after finishing the data reading operation, the processor cache data component sends the acquired data, data state and corresponding access request number to the vector component through an access request return data port, an access request return data state port and an access request return data number port, and the signal is identified to be valid by the vector component access request return data valid bit; 2.4A) if the access operation type is write-in data, after finishing writing the access write-in data into the appointed position, the processor cache data component sends the data state and the corresponding access request number to the vector component through the access request return data state and the access request return data number port, and the signal is returned that the data valid bit identifier is valid by the vector component access request.

Optionally, the sending the register request to the processor register unit of the general-purpose processor core in step 2) includes: 2.1B) aiming at the register operation type, the register number and the register write data which need to be accessed and obtained by the register, the signals are respectively sent to a processor register component of the general processor core through a register request type, a register request register number and a register request write data port of a register access, and the signals are identified to be valid by a vector component register request valid bit; 2.2B) the processor register unit receives the feedback bit port notification vector unit through the vector unit register request receiving feedback bit port of the register path after completing the receiving and saving of the register request; 2.3B) if the register operation type is read data, the processor register component sends the acquired data, the data state and the corresponding register number to the vector component through the register number port of the register request return data, the register request return data state and the register request return data after finishing the register data reading operation, and the signal is effectively identified by the vector component register request return data valid bit; 2.4B) if the register operation type is write-in data, after the processor register unit finishes writing the register write-in data into the designated register, the register state and the corresponding register number are sent to the vector unit through the register request return data state and the register request return data register number port, and the signals are identified to be valid by the vector unit register request return data valid bit.

Optionally, the sending the floating point calculation request to the processor floating point calculation unit of the general purpose processor core in step 2) includes: 2.1C) aiming at the floating point calculation type, the floating point calculation request number and the source operand A, B, C which need to be processed by the floating point calculation, respectively sending the signals to the processor floating point calculation unit through the ports of the floating point calculation request type, the floating point calculation request number, the floating point calculation request source operand A, the floating point calculation request source operand B and the floating point calculation request source operand C of the floating point calculation path, wherein the signals are identified to be valid by the vector unit floating point calculation request valid bit; 2.2C) the processor floating point calculation unit receives the feedback bit notification vector unit through the floating point calculation request of the floating point calculation path after completing the receiving and the saving of the floating point calculation request; 2.3C) after finishing the corresponding floating point calculation, the processor floating point calculation unit sends the obtained floating point calculation result data, the floating point calculation result state and the corresponding floating point calculation result number to the vector unit through the ports of the floating point calculation result data, the floating point calculation result state and the floating point calculation result number of the floating point calculation passage, and the signals are identified to be valid by the valid bit of the floating point calculation result.

Optionally, the sending the page table walk request to the processor page table walk component of the general purpose processor core in step 2) includes: 2.1D) aiming at the virtual address translation request number and the address page table translation request number of the vector address page table needing page table translation, respectively sending the virtual address translation request number and the address page table translation request number of the vector access memory address page table translation request of the page table translation passage to a processor page table translation component, wherein the signals are effectively identified by a vector address page table translation request effective bit; 2.2D) after completing the page table conversion of the corresponding vector access address, the page table conversion component of the processor sends the obtained return physical address, the return physical address state and the corresponding return physical address response number to the vector component through the page table conversion request return physical address of the page table conversion passage, the vector access address page table conversion request return physical address state and the vector access address page table conversion return physical address response number port, and the signals are effectively identified by the vector access address page table conversion request return physical address valid bit.

Optionally, the step 3) of returning the final vector calculation result to the general-purpose processor core by the vector unit includes: after completing all required operations corresponding to the vector request Vec Inst0, the vector component sends the final vector calculation result state, the vector calculation result number and the vector component state to the processor control component through the vector calculation result state, the vector calculation result number and the vector component state port of the control path, and the signals are identified to be valid by the vector calculation result valid bit.

In addition, the present invention also provides a cooperation interface of a general purpose processor core and a vector unit for applying the aforementioned cooperation method of a general purpose processor core and a vector unit, wherein the cooperation interface comprises: the control path is used for realizing that a processor control unit and a vector unit of the general processor core transmit operation instructions and states; a register path for implementing a processor register unit and a vector unit of a general purpose processor core to communicate register requests and data; the access path is used for realizing the transmission of access requests and data between the vector component and a processor cache data component of the general processor core; a floating point computation path for the vector unit to communicate requests and data for floating point operations with a processor floating point unit of the general purpose processor core; a page table translation path for the vector unit to communicate requests and data for page table translation operations with a processor page table translation unit of the general purpose processor core; the control path, the register path, the access path, the floating point calculation path and the page table conversion path are respectively connected and arranged between the general processor core and the vector unit.

Optionally, the control path includes a vector request valid bit, a vector request type, a vector request number, a vector request source operand a, a vector request source operand B, a vector request source operand C, a vector request result register number, a vector request receive feedback bit, a vector calculation result valid bit, a vector calculation result state, a vector calculation result number, a vector component state, wherein: the vector request valid bit is sent by a general processor and used for representing vector request sending; the vector request type is sent by a general processor and is used for representing the operation type of the vector request; the vector request number is sent by the general processor and is used for representing the number of the vector request in the general processor; the vector request source operand A is sent by a general processor and is used for representing the data or register number of the operand A of the vector request; the vector request source operand B is sent by a general processor and is used for representing the data or register number of the operand B of the vector request; the vector request source operand C is sent by a general processor and is used for representing the data or register number of the operand C of the vector request; the vector request result register number is sent by a general processor and is used for representing the result register number to be used by the vector request; the vector request receiving feedback bit is sent by the vector component and used for representing that the vector request is successfully received by the vector component; the vector calculation result valid bit is sent by the vector component and is used for representing the result valid flag bit of the vector request; the vector calculation result state is sent by the vector component and is used for representing the state of the result data of the vector request; the vector calculation result number is sent by the vector component and used for representing the number of a request for generating vector calculation result data in the general processor; the vector component state is sent by the vector component and is used for representing the running state of the vector component;

the register lane includes a vector unit register request valid bit, a vector unit register request type, a vector unit register request register number, a vector unit register request write data, a vector unit register request receive feedback bit, a vector unit register request return data valid bit, a vector unit register request return data state, a vector unit register request return data register number, wherein: the vector unit register request valid bit is sent by the vector unit and is used for representing the vector unit register request sending flag bit; the vector unit register request type is sent by the vector unit and is used for representing the operation type of the vector unit register request; the vector component register request register number is sent by the vector component and is used for representing the register number corresponding to the vector component register request; the vector unit register requests to write data, issued by the vector unit, for characterizing the vector unit register requests to write data; the vector unit register request receiving feedback bits, issued by the processor register unit, are used to characterize that the vector unit register request was successfully received by the register unit; the vector unit register requests to return a data valid bit, which is sent by the processor register unit and used for representing a vector unit register request to return a data valid flag bit; the vector unit register requests return data, which is sent by the processor register unit and used for representing the return data requested by the vector unit register; the vector unit register requests the status of the returned data, issued by the processor register unit, for characterizing the status of the returned data requested by the vector unit register; the vector component register requests a return data register number, is sent by the processor register component and is used for representing the register number corresponding to the return data requested by the vector component register;

the memory access path comprises a vector component memory access request valid bit, a vector component memory access request type, a vector component memory access request number, a vector component memory access request physical address, vector component memory access request write data, a vector component memory access request receiving feedback bit, a vector component memory access request return data valid bit, vector component memory access request return data, a vector component memory access request return data state and a vector component memory access request return data number, wherein: the vector component memory access request valid bit is sent by the vector component and is used for representing a vector component memory access request sending zone bit; the vector component access request type is sent by the vector component and is used for representing the operation type of the vector component access request; the vector component access request number is sent by the vector component and is used for representing the number of the vector component access request; the vector component access request physical address is sent by the vector component and is used for representing the physical address of the vector component access request; the vector component memory access request write data is sent by the vector component and is used for representing the memory access request write data of the vector component; the vector component access request receiving feedback bit is sent by the processor cache data component and used for representing that the vector component access request is successfully received by the cache data component; the vector component access request returns a data valid bit, which is sent by the processor cache data component and used for representing the vector component access request return data valid flag bit; the vector component access request return data is sent by the processor cache data component and is used for representing the return data of the vector component access request; the vector component access request returns a data state, is sent by the processor cache data component and is used for representing the state of returned data of the vector component access request; the vector component access request return data number is sent by a processor cache data component and is used for representing the access request number corresponding to the return data of the vector component access request;

the floating point calculation path comprises a floating point calculation request valid bit, a floating point calculation request type, a floating point calculation request number, a floating point calculation request source operand A, a floating point calculation request source operand B, a floating point calculation request source operand C, a floating point calculation request receiving feedback bit, a floating point calculation result valid bit, floating point calculation result data, a floating point calculation result state and a floating point calculation result number, wherein: the floating point calculation request valid bit is sent by the vector unit and is used for representing a floating point calculation request sending flag bit; the floating point calculation request type is sent by the vector unit and is used for representing the operation type of the floating point calculation request; the floating point calculation request number is sent by the vector unit and is used for representing the floating point calculation request number; the floating point calculation request source operand A is sent by the vector unit and is used for representing the data of the operand A of the floating point calculation request; the floating point calculation request source operand B is sent by the vector unit and is used for representing the data of the operand B of the floating point calculation request; the floating point calculation request source operand C is sent by the vector unit and is used for representing the data of the operand C of the floating point calculation request; the floating point calculation request receiving feedback bit is sent by the processor floating point calculation unit and used for representing that the floating point calculation request is successfully received by the floating point unit; the floating point calculation result valid bit is sent by the processor floating point calculation unit and is used for representing the result data valid flag bit of the floating point calculation request; the floating point calculation result data is sent by the processor floating point calculation unit and is used for representing the result data of the floating point calculation request; the floating point calculation result state is sent by the processor floating point calculation unit and is used for representing the state parameter of the result data of the floating point calculation request; the floating point calculation result number is sent by the processor floating point calculation unit and is used for representing the floating point calculation number corresponding to the result data of the floating point calculation request;

the page table translation path comprises a vector address page table translation request valid bit, a vector memory access address page table translation request virtual address, a vector memory access address page table translation request number, a vector memory access address page table translation request return physical address valid bit, a vector memory access address page table translation request return physical address state, and a vector memory access address page table translation return physical address response number, wherein: the vector access address page table conversion request valid bit is sent by a vector component and used for representing a vector access address page table conversion request sending flag bit; the vector access address page table conversion request address is sent by a vector component and used for representing a virtual address of the vector access address page table conversion request; the vector access address page table conversion request type is sent by a vector component and is used for representing the vector access address page table conversion request type; the vector access address page table conversion request number is sent by a vector component and is used for representing an accelerated calculation request number corresponding to the vector access address page table conversion request; the vector access address page table conversion request returns a physical address valid bit, is sent by the general processor and is used for representing a returned physical address valid flag bit of the vector access address page table conversion request; the vector access address page table conversion request returns a physical address, is sent by the general processor and is used for representing the returned physical address of the vector access address page table conversion request; the vector access address page table conversion request returns a data state, is sent by the general processor and is used for representing the state of a returned physical address of the vector access address page table conversion request; the vector memory access address page table conversion return physical address number is sent by the general processor and used for representing the accelerated calculation request number corresponding to the return physical address of the vector memory access address page table conversion request.

In addition, the invention also provides a processor, which comprises a general processor core, an interface component and a vector component, wherein the general processor core is connected with the vector component through the interface component, and the interface component is the cooperative interface of the general processor core and the vector component.

Compared with the prior art, the invention has the following advantages:

1. the invention aims at the core and the vector component of the general processor, can realize the direct interaction with the related components such as a control component, a register component, a data cache component, a floating point calculation component, page table conversion and the like by using a high-efficiency loosely-coupled special interface, and improves the data reading and writing efficiency and the instruction execution efficiency of the vector component.

2. The interactive components in the invention adopt an autonomous flow control mode, and do not need the real-time intervention of a processor control component, so the interactive system has the characteristic of high-efficiency interaction.

3. In the vector calculation, the operations of part of floating point calculation, page table conversion and the like are finished by corresponding parts of the general processor, so that the effect of reducing the resource realization cost of the vector parts is achieved.

Drawings

FIG. 1 is a schematic diagram of a basic flow of a method according to an embodiment of the present invention.

Fig. 2 is a schematic diagram of an implementation of a control path in an embodiment of the present invention.

FIG. 3 is a diagram illustrating an implementation of a memory access according to an embodiment of the present invention.

FIG. 4 is a diagram illustrating an implementation of a register lane according to an embodiment of the present invention.

FIG. 5 is a diagram illustrating an implementation of a floating-point computation path according to an embodiment of the present invention.

FIG. 6 is a diagram illustrating an implementation of a page table translation path according to an embodiment of the present invention.

Fig. 7 is a schematic structural diagram of a cooperative interface in the embodiment of the present invention.

Detailed Description

As shown in fig. 1, the method for coordinating a general-purpose processor core and a vector unit of the present embodiment includes:

As shown in fig. 2, step 1) in the present embodiment includes:

1.1) the vector unit receives a vector request sent from a general purpose processor core through a control path, the received vector request comprises a Type of operation sent through a vector calculation request port (Vec _ Type0), a request number (Vec _ req _ ID0), a source operand A, B, C (Vec _ req _ SrcA0, Vec _ req _ SrcB0, Vec _ req _ SrcC0), and a Result register number (Vec _ req _ Result _ Reg _ ID0), and the fracture sent signal is identified as valid by a vector calculation request valid bit;

1.2) the vector unit informs the processor control unit of the general-purpose processor core through the vector request receiving feedback bit port of the control path after completing the receiving and saving of the vector request.

This step aims at quickly completing the sending and receiving of vector requests through simplified interface content and handshake responses, unlike the strict time interval requirements of internal interface signals and response signals of a conventional processor core, the present embodiment method and interface relaxes the strict time interval requirements between interface modules, i.e. exhibits loosely coupled characteristics.

As shown in fig. 3, the sending of the memory access request to the processor data cache component of the general processor core in step 2) in this embodiment includes:

2.1A) aiming at cache Data needing to be accessed, the Type of memory access operation (Mem _ Req _ Type0), the number of memory access request (Mem _ Req _ ID0), the physical address of memory access request (Mem _ Req _ PA0) and the write Data of memory access request (Mem _ Req _ Wr _ Data0) which need to be acquired are respectively sent to a processor cache Data component through the Type of memory access request, the number of memory access request, the physical address of memory access request and the write Data port of memory access request, and the signals are identified to be valid by the memory access request valid bit of a vector component;

2.2A) after finishing receiving and storing the access request, the processor cache data component receives the feedback bit through the access request of the vector component to inform the vector component;

2.3A) if the access operation type is Data reading, after finishing the Data reading operation, the processor cache Data component sends the acquired Data (Mem _ Rsp _ Data0), the Data State (Mem _ Rsp _ State0) and the corresponding access request number (Mem _ Rsp _ ID0) to the vector component through the access request return Data, the access request return Data State and the access request return Data number port, and the signals are identified to be valid by the vector component access request return Data valid bit;

2.4A) if the access operation type is write-in data, after finishing writing the access write-in data into the appointed position, the processor cache data component sends a data State (Mem _ Rsp _ State0) and a corresponding access request number (Mem _ Rsp _ ID0) to the vector component through an access request return data State and an access request return data number port, and the signals are identified to be valid by a vector component access request return data valid position.

As shown in fig. 4, the sending of the register request to the processor register unit of the general-purpose processor core in step 2) in the embodiment includes:

2.1B) for register operation Type (Reg _ Req _ Type0), register number (Reg _ Req _ Reg _ ID0) and register write Data (Reg _ Req _ Wr _ Data0) which need register access acquisition, respectively sending the register request Type, the register request register number and the register request write Data port of the register path to a processor register unit of the general processor core, wherein the signals are identified to be valid by the vector unit register request valid bit;

2.2B) the processor register unit receives the feedback bit port notification vector unit through the vector unit register request receiving feedback bit port of the register path after completing the receiving and saving of the register request;

2.3B) if the register operation type is read Data, after the register Data reading operation is completed, the processor register component sends the acquired Data (Reg _ Rsp _ Data0), the Data State (Reg _ Rsp _ State0) and the corresponding register number (Reg _ Rsp _ ID0) to the vector component through a register request return Data, a register request return Data State and a register request return Data register number port, and the signals are identified to be valid by a vector component register request return Data valid bit;

2.4B) if the register operation type is write data, after the processor register unit finishes writing the register write data into the designated register, the register State (Reg _ Rsp _ State0) and the corresponding register number (Reg _ Rsp _ ID0) are sent to the vector unit through the register request return data State and register request return data register number port, and the signals are identified to be valid by the vector unit register request return data valid bit.

As shown in fig. 5, the sending of the floating point calculation request to the floating point calculation unit of the processor of the general-purpose processor core in step 2) of the present embodiment includes:

2.1C) floating point calculation Type (Fp _ Req _ Type0), floating point calculation request number (Mem _ Req _ ID0), source operand A, B, C (Fp _ Req _ SrcA0, Fp _ Req _ SrcB0, Fp _ Req _ SrcC0) to be handled for floating point calculation, the floating point calculation request Type, floating point calculation request number, floating point calculation request source operand a, floating point calculation request source operand B, floating point calculation request source operand C port, respectively, through the floating point calculation path, being sent to the processor floating point calculation unit, which is identified as valid by the vector unit floating point calculation request valid bit;

2.2C) the processor floating point calculation unit receives the feedback bit notification vector unit through the floating point calculation request of the floating point calculation path after completing the receiving and the saving of the floating point calculation request;

2.3C) after finishing the corresponding floating-point calculation, the processor floating-point calculation unit sends the obtained floating-point calculation Result Data (Fp _ Result _ Data0), the floating-point calculation Result State (Fp _ Result _ State0) and the corresponding floating-point calculation Result number (Fp _ Result _ ID0) to the vector unit through the ports of the floating-point calculation Result Data, the floating-point calculation Result State and the floating-point calculation Result number of the floating-point calculation path, and the signals are identified to be valid by the valid bit of the floating-point calculation Result.

As shown in fig. 6, sending the page table walk request to the processor page table walk component of the general processor core in step 2) of this embodiment includes:

2.1D) for a vector address page table translation virtual address (Tbw _ Req _ Va0) requiring page table translation, an address page table translation request number (Tbw _ Req _ ID0), the virtual address and the page table translation request number are sent to the processor page table translation unit through a vector access address page table translation request virtual address and a vector access address page table translation request number port of the page table translation path, respectively, the signals are identified as valid by a vector address page table translation request valid bit;

2.2D) after completing the page table translation of the corresponding vector memory access address, the page table translation component sends the obtained return physical address (Tbw _ Result _ Pa0), the return physical address State (Tbw _ Result _ State0) and the corresponding return physical address response number (Tbw _ Result _ ID0) to the vector component through a page table translation request return physical address, a vector memory access address page table translation request return physical address State and a vector memory access address page table translation return physical address response number port of a page table translation path, and the signals are identified to be valid by the vector memory access address page table translation request return physical address valid bit.

As shown in fig. 2, the step 3) of returning the final vector calculation result to the general processor core by the vector unit in this embodiment includes: after completing all required operations corresponding to the vector request Vec _ Inst0, the vector component sends the final vector calculation Result State (Vec _ Result _ State0), the vector calculation Result number (Vec _ Result _ ID0) and the vector component State (Vec _ State) to the processor control component through the vector calculation Result State, the vector calculation Result number and the vector component State port of the control path, and the signals are identified to be valid by the vector calculation Result valid bit.

The operations in step 2) are all performed by the vector unit according to the operation requirement of the actual vector request, and the purpose is to quickly complete the sending and receiving of the corresponding operation request and quickly receive the specified operation result or feedback through simplified interface content and handshake response to the processor data cache unit, the processor register unit, the processor floating point calculation unit and the processor page table conversion unit of the general processor core. In addition to relaxing the strict time interval requirement between the interface modules, i.e. exhibiting the loosely coupled characteristic, the implementation of some additional functional units, which are implemented in the general-purpose processor core and are actually idle for most of the time when a vector request is encountered, is reduced compared to the conventional independent vector accelerator (e.g. GPU).

As shown in fig. 7, the present embodiment further provides a cooperative interface of a general-purpose processor core and a vector unit, for applying the aforementioned cooperative method of a general-purpose processor core and a vector unit, where the cooperative interface includes:

the control path is used for realizing that a processor control unit and a vector unit of the general processor core transmit operation instructions and states;

a register path for implementing a processor register unit and a vector unit of a general purpose processor core to communicate register requests and data;

the access path is used for realizing the transmission of access requests and data between the vector component and a processor cache data component of the general processor core;

a floating point computation path for the vector unit to communicate requests and data for floating point operations with a processor floating point unit of the general purpose processor core;

a page table translation path for the vector unit to communicate requests and data for page table translation operations with a processor page table translation unit of the general purpose processor core; the control path, the register path, the access path, the floating point calculation path and the page table conversion path are respectively connected and arranged between the general processor core and the vector unit.

The control path is used for the processor control unit and the vector unit to transfer operation instructions and states. Referring to fig. 2, the control path in the present embodiment includes a vector request valid bit (core _ vec _ req _ vld), a vector request type (core _ vec _ req _ type), a vector request number (core _ vec _ req _ id), a vector request source operand a (core _ vec _ req _ src), a vector request source operand B (core _ vec _ req _ src), a vector request source operand C (core _ vec _ req _ src), a vector request result register number (core _ vec _ req _ result _ reg _ id), a vector request reception feedback bit (vec _ core _ req _ ack), a vector calculation result valid bit (vec _ core _ result _ vld), a vector calculation result state (vec _ core _ result), a vector calculation result number (vec _ core _ result _ state), a vector calculation result state (vec _ core _ result _ state), a vector calculation result _ state (vector _ result _ state, vector _ result _ state, vector _ result _ state, vector _ result _ state, vector _: the vector request valid bit is sent by a general processor and used for representing vector request sending; the vector request type is sent by a general processor and is used for representing the operation type of the vector request; the vector request number is sent by the general processor and is used for representing the number of the vector request in the general processor; the vector request source operand A is sent by a general processor and is used for representing the data or register number of the operand A of the vector request; the vector request source operand B is sent by a general processor and is used for representing the data or register number of the operand B of the vector request; the vector request source operand C is sent by a general processor and is used for representing the data or register number of the operand C of the vector request; the vector request result register number is sent by a general processor and is used for representing the result register number to be used by the vector request; the vector request receiving feedback bit is sent by the vector component and used for representing that the vector request is successfully received by the vector component; the vector calculation result valid bit is sent by the vector component and is used for representing the result valid flag bit of the vector request; the vector calculation result state is sent by the vector component and is used for representing the state of the result data of the vector request; the vector calculation result number is sent by the vector component and used for representing the number of a request for generating vector calculation result data in the general processor; the vector component state is sent by the vector component and is used for representing the running state of the vector component;

the access path is used for the vector component and the processor cache data component to transfer access requests and data. Referring to fig. 3, the access way in the present embodiment includes a vector component access request valid bit (vec _ core _ mem _ req _ vld), a vector component access request type (vec _ core _ mem _ req _ type), a vector component access request number (vec _ core _ mem _ req _ id), a vector component access request physical address (vec _ core _ mem _ req _ pa), a vector component access request write data (vec _ core _ mem _ req _ wr _ data), a vector component access request receive feedback bit (core _ vec _ mem _ req _ ack), a vector component access request return data valid bit (core _ vec _ mem _ p _ vld), a vector component access request return data (core _ vec _ mem _ p _ data), a vector component access request return data state (core _ mac _ mem _ p _ data), a vector component access request return data state (core _ mem _ m _ p _ id), wherein: the vector component memory access request valid bit is sent by the vector component and is used for representing a vector component memory access request sending zone bit; the vector component access request type is sent by the vector component and is used for representing the operation type of the vector component access request; the vector component access request number is sent by the vector component and is used for representing the number of the vector component access request; the vector component access request physical address is sent by the vector component and is used for representing the physical address of the vector component access request; the vector component memory access request write data is sent by the vector component and is used for representing the memory access request write data of the vector component; the vector component access request receiving feedback bit is sent by the processor cache data component and used for representing that the vector component access request is successfully received by the cache data component; the vector component access request returns a data valid bit, which is sent by the processor cache data component and used for representing the vector component access request return data valid flag bit; the vector component access request return data is sent by the processor cache data component and is used for representing the return data of the vector component access request; the vector component access request returns a data state, is sent by the processor cache data component and is used for representing the state of returned data of the vector component access request; the vector component access request return data number is sent by a processor cache data component and is used for representing the access request number corresponding to the return data of the vector component access request;

the floating point calculation path is used for the vector unit and the floating point unit of the processor to transfer requests and data of floating point operations. Referring to fig. 4, the register lane in the present embodiment includes a vector component register request valid bit (vec _ core _ reg _ req _ vld), a vector component register request type (vec _ core _ reg _ req _ type), a vector component register request register number (vec _ core _ reg _ req _ reg _ id), a vector component register request write data (vec _ core _ reg _ req _ wr _ data), a vector component register request receive feedback bit (core _ vec _ reg _ req _ ack), a vector component register request return data valid bit (core _ vec _ reg _ rsp _ vld), a vector component register request return data (core _ c _ reg _ rsp _ data), a vector component register request return data state (core _ vec _ reg _ rsp _ state), a vector component register request return data register number (core _ c _ reg _ req _ red _ id), wherein: the vector unit register request valid bit is sent by the vector unit and is used for representing the vector unit register request sending flag bit; the vector unit register request type is sent by the vector unit and is used for representing the operation type of the vector unit register request; the vector component register request register number is sent by the vector component and is used for representing the register number corresponding to the vector component register request; the vector unit register requests to write data, issued by the vector unit, for characterizing the vector unit register requests to write data; the vector unit register request receiving feedback bits, issued by the processor register unit, are used to characterize that the vector unit register request was successfully received by the register unit; the vector unit register requests to return a data valid bit, which is sent by the processor register unit and used for representing a vector unit register request to return a data valid flag bit; the vector unit register requests return data, which is sent by the processor register unit and used for representing the return data requested by the vector unit register; the vector unit register requests the status of the returned data, issued by the processor register unit, for characterizing the status of the returned data requested by the vector unit register; the vector component register requests a return data register number, is sent by the processor register component and is used for representing the register number corresponding to the return data requested by the vector component register;

the floating point calculation path is used for the vector unit and the floating point unit of the processor to transfer requests and data of floating point operations. Referring to fig. 5, the floating-point calculation path in the present embodiment includes a floating-point calculation request valid bit (vec _ core _ fp _ req _ vld), a floating-point calculation request type (vec _ core _ fp _ req _ type), a floating-point calculation request number (vec _ core _ fp _ req _ id), a floating-point calculation request source operand a (vec _ core _ fp _ req _ src), a floating-point calculation request source operand B (vec _ core _ fp _ req _ src), a floating-point calculation request source operand C (vec _ core _ fp _ req _ src), a floating-point calculation request receiving feedback bit (core _ vec _ fp _ req _ ack), a floating-point calculation result valid bit (core _ vec _ result _ vld), floating-point calculation result data (core _ vec _ fp _ result), a floating-point calculation result state (core _ result _ status), a floating-point calculation result number (vec _ result _ status): the floating point calculation request valid bit is sent by the vector unit and is used for representing a floating point calculation request sending flag bit; the floating point calculation request type is sent by the vector unit and is used for representing the operation type of the floating point calculation request; the floating point calculation request number is sent by the vector unit and is used for representing the floating point calculation request number; the floating point calculation request source operand A is sent by the vector unit and is used for representing the data of the operand A of the floating point calculation request; the floating point calculation request source operand B is sent by the vector unit and is used for representing the data of the operand B of the floating point calculation request; the floating point calculation request source operand C is sent by the vector unit and is used for representing the data of the operand C of the floating point calculation request; the floating point calculation request receiving feedback bit is sent by the processor floating point calculation unit and used for representing that the floating point calculation request is successfully received by the floating point unit; the floating point calculation result valid bit is sent by the processor floating point calculation unit and is used for representing the result data valid flag bit of the floating point calculation request; the floating point calculation result data is sent by the processor floating point calculation unit and is used for representing the result data of the floating point calculation request; the floating point calculation result state is sent by the processor floating point calculation unit and is used for representing the state parameter of the result data of the floating point calculation request; the floating point calculation result number is sent by the processor floating point calculation unit and is used for representing the floating point calculation number corresponding to the result data of the floating point calculation request;

the page table translation path is used for the vector unit and the processor page table translation unit to pass the request and data of the page table translation operation. Referring to fig. 6, the page table translation path in this embodiment includes a vector address page table translation request valid bit (vec _ core _ tbw _ req _ vld), a vector access address page table translation request virtual address (vec _ core _ tbw _ req _ va), a vector access address page table translation request number (vec _ core _ tbw _ req _ id), a vector access address page table translation request return physical address valid bit (core _ vec _ tbw _ rsp _ vld), a vector access address page table translation request return physical address (core _ vec _ tbw _ rsp _ pa), a vector access page table address translation request return physical address state (core _ vec _ tbw _ rsp _ state), a vector access address page table translation request return physical address response number (core _ vec _ tbw _ rsp _ id), where: the vector access address page table conversion request valid bit is sent by a vector component and used for representing a vector access address page table conversion request sending flag bit; the vector access address page table conversion request address is sent by a vector component and used for representing a virtual address of the vector access address page table conversion request; the vector access address page table conversion request type is sent by a vector component and is used for representing the vector access address page table conversion request type; the vector access address page table conversion request number is sent by a vector component and is used for representing an accelerated calculation request number corresponding to the vector access address page table conversion request; the vector access address page table conversion request returns a physical address valid bit, is sent by the general processor and is used for representing a returned physical address valid flag bit of the vector access address page table conversion request; the vector access address page table conversion request returns a physical address, is sent by the general processor and is used for representing the returned physical address of the vector access address page table conversion request; the vector access address page table conversion request returns a data state, is sent by the general processor and is used for representing the state of a returned physical address of the vector access address page table conversion request; the vector memory access address page table conversion return physical address number is sent by the general processor and used for representing the accelerated calculation request number corresponding to the return physical address of the vector memory access address page table conversion request.

In addition, this embodiment further provides a processor, which includes a general processor core, an interface component, and a vector component, where the general processor core is connected to the vector component through the interface component, the interface component is a cooperative interface between the general processor core and the vector component, and the general processor core may be a single core or multiple cores.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-readable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein. The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above description is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above embodiments, and all technical solutions belonging to the idea of the present invention belong to the protection scope of the present invention. It should be noted that modifications and embellishments within the scope of the invention may occur to those skilled in the art without departing from the principle of the invention, and are considered to be within the scope of the invention.

Claims

1. A method for coordinating a general purpose processor core with a vector component, comprising:

2. The method for cooperation between a general-purpose processor core and a vector unit as claimed in claim 1, wherein step 1) comprises: 1.1) the vector unit receives a vector request sent by a general processor core through a control path, the received vector request comprises a vector calculation request type, a vector calculation request number, a vector calculation request source operand A, a vector calculation request source operand B, a vector calculation request source operand C, an operation type sent by a vector calculation request result register number port, a request number, a source operand A, B, C and a result register number, and a signal sent by the fracture is identified to be valid by a vector calculation request valid bit; 1.2) the vector unit informs the processor control unit of the general-purpose processor core through the vector request receiving feedback bit port of the control path after completing the receiving and saving of the vector request.

3. The method of claim 2, wherein the step 2) of sending the memory access request to the processor data cache component of the general-purpose processor core comprises: 2.1A) aiming at cache data needing to be accessed, the access operation type, the access request number, the access request physical address and the access request write data which need to be obtained are respectively sent to a processor cache data component through the access request type, the access request number, the access request physical address and the access request write data port of an access way, and the signals are identified to be valid by an access request valid bit of a vector component; 2.2A) after finishing receiving and storing the access request, the processor cache data component receives the feedback bit through the access request of the vector component to inform the vector component; 2.3A) if the access operation type is reading data, after finishing the data reading operation, the processor cache data component sends the acquired data, data state and corresponding access request number to the vector component through an access request return data port, an access request return data state port and an access request return data number port, and the signal is identified to be valid by the vector component access request return data valid bit; 2.4A) if the access operation type is write-in data, after finishing writing the access write-in data into the appointed position, the processor cache data component sends the data state and the corresponding access request number to the vector component through the access request return data state and the access request return data number port, and the signal is returned that the data valid bit identifier is valid by the vector component access request.

4. The method of claim 2, wherein the step 2) of sending the register request to the processor register unit of the general purpose processor core comprises: 2.1B) aiming at the register operation type, the register number and the register write data which need to be accessed and obtained by the register, the signals are respectively sent to a processor register component of the general processor core through a register request type, a register request register number and a register request write data port of a register access, and the signals are identified to be valid by a vector component register request valid bit; 2.2B) the processor register unit receives the feedback bit port notification vector unit through the vector unit register request receiving feedback bit port of the register path after completing the receiving and saving of the register request; 2.3B) if the register operation type is read data, the processor register component sends the acquired data, the data state and the corresponding register number to the vector component through the register number port of the register request return data, the register request return data state and the register request return data after finishing the register data reading operation, and the signal is effectively identified by the vector component register request return data valid bit; 2.4B) if the register operation type is write-in data, after the processor register unit finishes writing the register write-in data into the designated register, the register state and the corresponding register number are sent to the vector unit through the register request return data state and the register request return data register number port, and the signals are identified to be valid by the vector unit register request return data valid bit.

5. The method of claim 2, wherein the step 2) of sending the floating point calculation request to the floating point calculation unit of the general purpose processor core comprises: 2.1C) aiming at the floating point calculation type, the floating point calculation request number and the source operand A, B, C which need to be processed by the floating point calculation, respectively sending the signals to the processor floating point calculation unit through the ports of the floating point calculation request type, the floating point calculation request number, the floating point calculation request source operand A, the floating point calculation request source operand B and the floating point calculation request source operand C of the floating point calculation path, wherein the signals are identified to be valid by the vector unit floating point calculation request valid bit; 2.2C) the processor floating point calculation unit receives the feedback bit notification vector unit through the floating point calculation request of the floating point calculation path after completing the receiving and the saving of the floating point calculation request; 2.3C) after finishing the corresponding floating point calculation, the processor floating point calculation unit sends the obtained floating point calculation result data, the floating point calculation result state and the corresponding floating point calculation result number to the vector unit through the ports of the floating point calculation result data, the floating point calculation result state and the floating point calculation result number of the floating point calculation passage, and the signals are identified to be valid by the valid bit of the floating point calculation result.

6. The method of claim 2, wherein the step 2) of sending a page table walk request to a processor page table walk component of the general purpose processor core comprises: 2.1D) aiming at the virtual address translation request number and the address page table translation request number of the vector address page table needing page table translation, respectively sending the virtual address translation request number and the address page table translation request number of the vector access memory address page table translation request of the page table translation passage to a processor page table translation component, wherein the signals are effectively identified by a vector address page table translation request effective bit; 2.2D) after completing the page table conversion of the corresponding vector access address, the page table conversion component of the processor sends the obtained return physical address, the return physical address state and the corresponding return physical address response number to the vector component through the page table conversion request return physical address of the page table conversion passage, the vector access address page table conversion request return physical address state and the vector access address page table conversion return physical address response number port, and the signals are effectively identified by the vector access address page table conversion request return physical address valid bit.

7. The method as claimed in claim 1, wherein the step 3) of returning the final vector calculation result to the general-purpose processor core by the vector unit comprises: after completing all required operations corresponding to the vector request Vec Inst0, the vector component sends the final vector calculation result state, the vector calculation result number and the vector component state to the processor control component through the vector calculation result state, the vector calculation result number and the vector component state port of the control path, and the signals are identified to be valid by the vector calculation result valid bit.

8. A general purpose processor core and vector component cooperative interface for applying the method of any one of claims 1 to 7, wherein the cooperative interface comprises: the control path is used for realizing that a processor control unit and a vector unit of the general processor core transmit operation instructions and states; a register path for implementing a processor register unit and a vector unit of a general purpose processor core to communicate register requests and data; the access path is used for realizing the transmission of access requests and data between the vector component and a processor cache data component of the general processor core; a floating point computation path for the vector unit to communicate requests and data for floating point operations with a processor floating point unit of the general purpose processor core; a page table translation path for the vector unit to communicate requests and data for page table translation operations with a processor page table translation unit of the general purpose processor core; the control path, the register path, the access path, the floating point calculation path and the page table conversion path are respectively connected and arranged between the general processor core and the vector unit.

9. The processor core of claim 8, wherein the processor core is further configured to: the control path includes a vector request valid bit, a vector request type, a vector request number, a vector request source operand A, a vector request source operand B, a vector request source operand C, a vector request result register number, a vector request receive feedback bit, a vector calculation result valid bit, a vector calculation result state, a vector calculation result number, a vector component state, wherein: the vector request valid bit is sent by a general processor and used for representing vector request sending; the vector request type is sent by a general processor and is used for representing the operation type of the vector request; the vector request number is sent by the general processor and is used for representing the number of the vector request in the general processor; the vector request source operand A is sent by a general processor and is used for representing the data or register number of the operand A of the vector request; the vector request source operand B is sent by a general processor and is used for representing the data or register number of the operand B of the vector request; the vector request source operand C is sent by a general processor and is used for representing the data or register number of the operand C of the vector request; the vector request result register number is sent by a general processor and is used for representing the result register number to be used by the vector request; the vector request receiving feedback bit is sent by the vector component and used for representing that the vector request is successfully received by the vector component; the vector calculation result valid bit is sent by the vector component and is used for representing the result valid flag bit of the vector request; the vector calculation result state is sent by the vector component and is used for representing the state of the result data of the vector request; the vector calculation result number is sent by the vector component and used for representing the number of a request for generating vector calculation result data in the general processor; the vector component state is sent by the vector component and is used for representing the running state of the vector component;

10. A processor comprising a general purpose processor core, an interface unit and a vector unit, said general purpose processor core being connected to the vector unit via the interface unit, characterized in that said interface unit is a cooperative interface of the general purpose processor core and the vector unit as claimed in claim 8 or 9.