CN113806250B - Method for coordinating general processor core and vector component, interface and processor - Google Patents

Method for coordinating general processor core and vector component, interface and processor Download PDF

Info

Publication number
CN113806250B
CN113806250B CN202111123314.7A CN202111123314A CN113806250B CN 113806250 B CN113806250 B CN 113806250B CN 202111123314 A CN202111123314 A CN 202111123314A CN 113806250 B CN113806250 B CN 113806250B
Authority
CN
China
Prior art keywords
vector
request
register
floating point
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111123314.7A
Other languages
Chinese (zh)
Other versions
CN113806250A (en
Inventor
郭维
邓全
雷国庆
郭辉
王俊辉
郑重
黄立波
隋兵才
倪晓强
孙彩霞
王永文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN202111123314.7A priority Critical patent/CN113806250B/en
Publication of CN113806250A publication Critical patent/CN113806250A/en
Application granted granted Critical
Publication of CN113806250B publication Critical patent/CN113806250B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/10Address translation
    • G06F12/1009Address translation using page tables, e.g. page table structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/3004Arrangements for executing specific machine instructions to perform operations on memory
    • G06F9/30047Prefetch instructions; cache control instructions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30098Register arrangements
    • G06F9/3012Organisation of register space, e.g. banked or distributed register file
    • G06F9/30138Extension of register space, e.g. register cache

Abstract

The invention discloses a cooperation method of a general processor core and a vector unit, an interface and a processor, wherein the method comprises the steps that the vector unit receives a vector request and judges the condition of the vector request, if cache data needs to be accessed, a memory access request is sent to a processor data cache unit of the general processor core, if register access is needed, a register request is sent to a processor register unit of the general processor core, if floating point calculation is needed, a floating point calculation request is sent to a processor floating point calculation unit of the general processor core, and if page table conversion is needed, a page table conversion request is sent to a processor page table conversion unit of the general processor core. The invention aims at the core and the vector component of the general processor, can realize the direct interaction with the related components such as a control component, a register component, a data cache component, a floating point calculation component, page table conversion and the like by using a high-efficiency loosely-coupled special interface, and improves the data reading and writing efficiency and the instruction execution efficiency of the vector component.

Description

Method for coordinating general processor core and vector component, interface and processor
Technical Field
The invention relates to a design technology of a cooperation interface of a general processor core and a vector component, in particular to a cooperation method of the general processor core and the vector component, an interface and a processor.
Background
With the continuous development of computer technology, the demand for different calculations is continuously increased, and due to design goals and other reasons, the general-purpose processor is generally low in application efficiency in the aspects of vector calculation or neural network calculation, and cannot provide the calculation speed or data scale required by the vector calculation or the neural network calculation. At the same time, there is still a substantial amount of general-purpose computing operations in the computing requirements of vector computing or neural network computing, and vector components typically provide no general-purpose computing power, or only very little general-purpose computing power. Solutions have thus emerged in which a general purpose processor core cooperates with a dedicated graphics accelerator (e.g., CPU-GPU), but different vector component vendors typically only provide implementations for programming their devices. For heterogeneous systems, it is generally difficult to implement mechanism programming using programming languages of the same style, and it is also very difficult to process different devices as a unified computing unit, and for example, with GPU acceleration, two major vendors NVIDIA and AMD develop and maintain independent GPU programming components CUDA SDK and AMD APP (ATI STREAM), respectively. Therefore, such explicit heterogeneous solution has high requirements for programming and heterogeneous coordination, especially for memory management, requires a programmer to explicitly declare and explicitly move between the main memory and the device memory, and has great difficulty in practical applications.
To provide vector computation oriented computing power in a general purpose processor, the use of separate vector components is an efficient solution. And reserving a special instruction in the instruction set design, reserving a special interface in the general processor core, and interacting the vector component and the general processor component by adopting an interface protocol. The scheme completely reserves the general processing capability of the general processor, and can distribute the calculation acceleration part in the vector calculation to the vector component, thereby realizing the maximization of the calculation efficiency of the vector calculation and the general calculation. However, in the existing implementation design scheme, the interaction of control and data signals is too frequent, and the time sequence of a production line is tense; the specific resources in the vector component are insufficient, and the general processor resources are idle and wasted.
Disclosure of Invention
The technical problems to be solved by the invention are as follows: aiming at the problems in the prior art, the invention provides a cooperation method of a general processor core and a vector unit, an interface and a processor, and can realize the direct interaction with related units such as a control unit, a register unit, a data cache unit, a floating point calculation unit, page table conversion and the like by using an efficient loosely-coupled special interface aiming at the general processor core and the vector unit, thereby improving the data reading and writing efficiency and the instruction execution efficiency of the vector unit.
In order to solve the technical problems, the invention adopts the technical scheme that:
a method of coordinating a general purpose processor core with a vector component, comprising:
1) The vector unit receives a vector request sent by the general-purpose processor core;
2) The vector component judges whether the vector request needs to access the cache data, if the cache data needs to be accessed, the vector component sends a memory access request to a processor data cache component of the general processor core, so that the processor data cache component processes the memory access request and returns the data or the state of the memory access request; the vector unit judges whether the vector request needs register access, if so, the vector unit sends the register request to a processor register unit of the general processor core, so that the processor register unit processes the register request and returns the data or the state of the register request; the vector unit judges whether the vector request needs floating point calculation, and if the vector request needs floating point calculation, the vector unit sends a floating point calculation request to the processor floating point calculation unit of the general processor core, so that the processor floating point calculation unit processes the floating point calculation request and returns a floating point calculation result; the vector component judges whether the vector request needs page table translation, if the vector request needs page table translation, the vector component sends a page table translation request to a processor page table translation component of the general processor core, so that the processor page table translation component processes the page table translation request and returns a page table translation result;
3) Judging whether the operation is finished or not, and if the operation is not finished, skipping to execute the step 2); otherwise the vector component returns the final vector computation result to the general purpose processor core.
Optionally, step 1) comprises: 1.1 Vector unit receives vector request sent from general processor core through control path, and the received vector request includes operation type sent through vector calculation request type, vector calculation request number, vector calculation request source operand A, vector calculation request source operand B, vector calculation request source operand C, vector calculation request result register number port, request number, source operand A, B, C and result register number, and the signal sent by the fracture is identified as valid by vector calculation request valid bit; 1.2 Vector unit notifies the processor control unit of the general-purpose processor core through the vector request reception feedback bit port of the control path after completing reception and saving of the vector request.
Optionally, the sending the memory access request to the processor data cache component of the general-purpose processor core in step 2) includes: 2.1A) aiming at cache data needing to be accessed, the access operation type, the access request number, the access request physical address and the access request write data which need to be obtained are respectively sent to a processor cache data component through the access request type, the access request number, the access request physical address and the access request write data port of an access way, and the signals are identified to be valid by an access request valid bit of a vector component; 2.2A) after finishing receiving and storing the access request, the processor cache data component receives the feedback bit through the access request of the vector component to inform the vector component; 2.3A) if the access operation type is reading data, after finishing the data reading operation, the processor cache data component sends the acquired data, data state and corresponding access request number to the vector component through an access request return data port, an access request return data state port and an access request return data number port, and the signal is identified to be valid by the vector component access request return data valid bit; 2.4A) if the access operation type is write-in data, after finishing writing the access write-in data into the appointed position, the processor cache data component sends the data state and the corresponding access request number to the vector component through the access request return data state and the access request return data number port, and the signal is returned that the data valid bit identifier is valid by the vector component access request.
Optionally, the sending the register request to the processor register unit of the general-purpose processor core in step 2) includes: 2.1B) aiming at the register operation type, the register number and the register write data which need to be accessed and obtained by the register, the signals are respectively sent to a processor register component of the general processor core through a register request type, a register request register number and a register request write data port of a register access, and the signals are identified to be valid by a vector component register request valid bit; 2.2B) the processor register unit receives the feedback bit port notification vector unit through the vector unit register request receiving feedback bit port of the register path after completing the receiving and saving of the register request; 2.3B) if the register operation type is read data, the processor register component sends the acquired data, the data state and the corresponding register number to the vector component through the register number port of the register request return data, the register request return data state and the register request return data after finishing the register data reading operation, and the signal is effectively identified by the vector component register request return data valid bit; 2.4B) if the register operation type is write-in data, after the processor register unit finishes writing the register write-in data into the designated register, the register state and the corresponding register number are sent to the vector unit through the register request return data state and the register request return data register number port, and the signals are identified to be valid by the vector unit register request return data valid bit.
Optionally, the sending the floating point calculation request to the processor floating point calculation unit of the general purpose processor core in step 2) includes: 2.1C) aiming at the floating point calculation type, the floating point calculation request number and the source operand A, B, C which need to be processed by the floating point calculation, respectively sending the floating point calculation request type, the floating point calculation request number, the floating point calculation request source operand A, the floating point calculation request source operand B and the floating point calculation request source operand C port of a floating point calculation path to a floating point calculation unit of the processor, wherein the signals are valid through a vector unit floating point calculation request valid bit identifier; 2.2C) the processor floating point calculation unit receives the feedback bit notification vector unit through the floating point calculation request of the floating point calculation path after completing the receiving and the saving of the floating point calculation request; 2.3C) after finishing the corresponding floating point calculation, the processor floating point calculation unit sends the obtained floating point calculation result data, the floating point calculation result state and the corresponding floating point calculation result number to the vector unit through the ports of the floating point calculation result data, the floating point calculation result state and the floating point calculation result number of the floating point calculation passage, and the signals are identified to be valid by the valid bit of the floating point calculation result.
Optionally, the sending the page table walk request to the processor page table walk component of the general purpose processor core in step 2) includes: 2.1D) aiming at the virtual address translation request number and the address page table translation request number of the vector address page table needing page table translation, respectively sending the virtual address translation request number and the address page table translation request number of the vector access memory address page table translation request of the page table translation passage to a processor page table translation component, wherein the signals are effectively identified by a vector address page table translation request effective bit; 2.2D) after completing the page table conversion of the corresponding vector access address, the page table conversion component of the processor sends the obtained return physical address, the return physical address state and the corresponding return physical address response number to the vector component through the page table conversion request return physical address of the page table conversion passage, the vector access address page table conversion request return physical address state and the vector access address page table conversion return physical address response number port, and the signals are effectively identified by the vector access address page table conversion request return physical address valid bit.
Optionally, the step 3) of returning the final vector calculation result to the general processor core by the vector unit includes: after finishing all required operations corresponding to the vector request Vec _ Inst0, the vector component sends the final vector calculation result state, the vector calculation result number and the vector component state to the processor control component through the vector calculation result state, the vector calculation result number and the vector component state port of the control path, and the signals are identified to be valid by the vector calculation result valid bit.
In addition, the present invention also provides a cooperation interface of a general purpose processor core and a vector unit for applying the aforementioned cooperation method of a general purpose processor core and a vector unit, where the cooperation interface includes: the control path is used for realizing that a processor control unit and a vector unit of the general processor core transmit operation instructions and states; a register path to implement a processor register unit and a vector unit of a general purpose processor core to communicate register requests and data; the access path is used for realizing the transmission of access requests and data between the vector component and a processor cache data component of the general processor core; a floating point computation path for the vector unit to communicate requests and data for floating point operations with a processor floating point unit of the general purpose processor core; a page table translation path for the vector unit to pass requests and data for page table translation operations with a processor page table translation unit of the general purpose processor core; the control path, the register path, the access path, the floating point calculation path and the page table conversion path are respectively connected and arranged between the general processor core and the vector unit.
Optionally, the control path includes a vector request valid bit, a vector request type, a vector request number, a vector request source operand a, a vector request source operand B, a vector request source operand C, a vector request result register number, a vector request receive feedback bit, a vector calculation result valid bit, a vector calculation result state, a vector calculation result number, a vector component state, wherein: the vector request valid bit is sent by the general processor and used for representing the sending of the vector request; the vector request type is sent by a general processor and is used for representing the operation type of the vector request; the vector request number is sent by the general processor and is used for representing the number of the vector request in the general processor; the vector request source operand A is sent by a general processor and is used for representing the data or register number of the operand A of the vector request; the vector request source operand B is sent by a general processor and is used for representing the data or register number of the operand B of the vector request; the vector request source operand C is sent by a general processor and is used for representing the data or register number of the operand C of the vector request; the vector request result register number is sent by a general processor and is used for representing the result register number to be used by the vector request; the vector request receiving feedback bit is sent by the vector component and used for representing that the vector request is successfully received by the vector component; the vector calculation result valid bit is sent by the vector component and is used for representing a result valid flag bit of the vector request; the vector calculation result state is sent by the vector component and is used for representing the state of the result data of the vector request; the vector calculation result number is sent by the vector component and is used for representing the number of a request for generating vector calculation result data in the general processor; the vector component state is sent by the vector component and is used for representing the running state of the vector component;
the register lane includes a vector unit register request valid bit, a vector unit register request type, a vector unit register request register number, a vector unit register request write data, a vector unit register request receive feedback bit, a vector unit register request return data valid bit, a vector unit register request return data state, a vector unit register request return data register number, wherein: the vector unit register request valid bit is sent by the vector unit and is used for representing the vector unit register request sending flag bit; the vector unit register request type is sent by a vector unit and is used for representing the operation type of the vector unit register request; the vector component register request register number is sent by the vector component and is used for representing the register number corresponding to the vector component register request; the vector unit register requesting write data, issued by the vector unit, for characterizing the vector unit register requesting write data; the vector unit register request receiving feedback bits, issued by the processor register unit, are used to characterize that the vector unit register request was successfully received by the register unit; the vector unit register request return data valid bit is sent by the processor register unit and is used for representing the vector unit register request return data valid flag bit; the vector unit register requests return data, which is sent by the processor register unit and used for representing the return data requested by the vector unit register; the vector unit register request return data state, issued by the processor register unit, for characterizing the state of the return data requested by the vector unit register; the vector component register requests a return data register number, is sent by the processor register component and is used for representing the register number corresponding to the return data requested by the vector component register;
the memory access path comprises a vector component memory access request valid bit, a vector component memory access request type, a vector component memory access request number, a vector component memory access request physical address, vector component memory access request write data, a vector component memory access request receiving feedback bit, a vector component memory access request return data valid bit, vector component memory access request return data, a vector component memory access request return data state and a vector component memory access request return data number, wherein: the vector component memory access request valid bit is sent by the vector component and is used for representing a vector component memory access request sending zone bit; the vector component access request type is sent by the vector component and is used for representing the operation type of the vector component access request; the vector component access request number is sent by the vector component and is used for representing the number of the vector component access request; the vector component access request physical address is sent by the vector component and is used for representing the physical address of the vector component access request; the vector component memory access request write data is sent by the vector component and is used for representing the memory access request write data of the vector component; the vector component access request receiving feedback bit is sent by the processor cache data component and used for representing that the vector component access request is successfully received by the cache data component; the vector component access request returns a data valid bit, which is sent by the processor cache data component and used for representing the vector component access request return data valid flag bit; the vector component access request return data is sent by the processor cache data component and is used for representing the return data of the vector component access request; the vector component access request returns a data state, is sent by the processor cache data component and is used for representing the state of returned data of the vector component access request; the vector component access request return data number is sent by a processor cache data component and is used for representing the access request number corresponding to the return data of the vector component access request;
the floating point calculation path comprises a floating point calculation request valid bit, a floating point calculation request type, a floating point calculation request number, a floating point calculation request source operand A, a floating point calculation request source operand B, a floating point calculation request source operand C, a floating point calculation request receiving feedback bit, a floating point calculation result valid bit, floating point calculation result data, a floating point calculation result state and a floating point calculation result number, wherein: the floating point calculation request valid bit is sent by the vector unit and is used for representing a floating point calculation request sending flag bit; the floating point calculation request type is sent by the vector unit and is used for representing the operation type of the floating point calculation request; the floating point calculation request number is sent by the vector unit and is used for representing the floating point calculation request number; the floating point calculation request source operand A is sent by the vector unit and is used for representing the data of the operand A of the floating point calculation request; the floating point calculation request source operand B is sent by the vector unit and is used for representing the data of the operand B of the floating point calculation request; the floating point calculation request source operand C is sent by the vector unit and is used for representing the data of the operand C of the floating point calculation request; the floating point calculation request receiving feedback bit is sent by the processor floating point calculation unit and used for representing that the floating point calculation request is successfully received by the floating point unit; the floating point calculation result valid bit is sent by the processor floating point calculation unit and is used for representing the result data valid flag bit of the floating point calculation request; the floating point calculation result data is sent by the processor floating point calculation unit and is used for representing the result data of the floating point calculation request; the floating point calculation result state is sent by the processor floating point calculation unit and is used for representing state parameters of result data of the floating point calculation request; the floating point calculation result number is sent by the processor floating point calculation unit and is used for representing the floating point calculation number corresponding to the result data of the floating point calculation request;
the page table translation path comprises a vector access memory address page table translation request valid bit, a vector access memory address page table translation request virtual address, a vector access memory address page table translation request number, a vector access memory address page table translation request returned physical address valid bit, a vector access memory address page table translation request returned physical address state, and a vector access memory address page table translation returned physical address response number, wherein: the vector access address page table translation request valid bit is sent by a vector component and is used for representing a vector access address page table translation request sending flag bit; the vector access address page table conversion request address is sent by a vector component and used for representing a virtual address of the vector access address page table conversion request; the vector access address page table conversion request type is sent by a vector component and is used for representing the vector access address page table conversion request type; the vector access address page table conversion request number is sent by a vector component and is used for representing an accelerated calculation request number corresponding to the vector access address page table conversion request; the vector access address page table conversion request returns a physical address valid bit, is sent by the general processor and is used for representing a returned physical address valid flag bit of the vector access address page table conversion request; the vector access address page table conversion request returns a physical address, is sent by the general processor and is used for representing the returned physical address of the vector access address page table conversion request; the vector access address page table translation request returns a data state, is sent by the general processor and is used for representing the state of a returned physical address of the vector access address page table translation request; the vector access memory address page table conversion return physical address number is sent by the general processor and used for representing the accelerated calculation request number corresponding to the return physical address of the vector access memory address page table conversion request.
In addition, the invention also provides a processor, which comprises a general processor core, an interface component and a vector component, wherein the general processor core is connected with the vector component through the interface component, and the interface component is a cooperative interface of the general processor core and the vector component.
Compared with the prior art, the invention has the following advantages:
1. the invention aims at the core and the vector component of the general processor, can realize the direct interaction with the related components such as a control component, a register component, a data cache component, a floating point calculation component, page table conversion and the like by using a high-efficiency loosely-coupled special interface, and improves the data reading and writing efficiency and the instruction execution efficiency of the vector component.
2. The interactive components in the invention adopt an autonomous flow control mode, and do not need the real-time intervention of a processor control component, so the interactive system has the characteristic of high-efficiency interaction.
3. In the vector calculation, the operations of part of floating point calculation, page table conversion and the like are finished by corresponding parts of the general processor, so that the effect of reducing the resource realization cost of the vector parts is achieved.
Drawings
FIG. 1 is a schematic diagram of a basic flow of a method according to an embodiment of the present invention.
Fig. 2 is a schematic diagram of an implementation of a control path in an embodiment of the present invention.
FIG. 3 is a diagram illustrating an implementation of a memory access according to an embodiment of the present invention.
FIG. 4 is a diagram illustrating an implementation of a register lane according to an embodiment of the present invention.
FIG. 5 is a diagram illustrating an implementation of a floating-point computation path according to an embodiment of the present invention.
FIG. 6 is a diagram illustrating an implementation of a page table translation path according to an embodiment of the present invention.
Fig. 7 is a schematic structural diagram of a cooperative interface in an embodiment of the present invention.
Detailed Description
As shown in fig. 1, the method for coordinating a general-purpose processor core and a vector unit of the present embodiment includes:
1) The vector unit receives a vector request sent by the general-purpose processor core;
2) The vector component judges whether the vector request needs to access the cache data, if the cache data needs to be accessed, the vector component sends a memory access request to a processor data cache component of the universal processor core, so that the processor data cache component processes the memory access request and returns the data or the state of the memory access request; the vector unit judges whether the vector request needs register access, if so, the vector unit sends the register request to a processor register unit of the general processor core, so that the processor register unit processes the register request and returns the data or the state of the register request; the vector unit judges whether the vector request needs floating point calculation, and if the vector request needs floating point calculation, the vector unit sends a floating point calculation request to the processor floating point calculation unit of the general processor core, so that the processor floating point calculation unit processes the floating point calculation request and returns a floating point calculation result; the vector component judges whether the vector request needs page table translation, if the vector request needs page table translation, the vector component sends a page table translation request to a processor page table translation component of the general processor core, so that the processor page table translation component processes the page table translation request and returns a page table translation result;
3) Judging whether the operation is finished or not, and if the operation is not finished, skipping to execute the step 2); otherwise the vector unit returns the final vector computation result to the general purpose processor core.
As shown in fig. 2, step 1) in the present embodiment includes:
1.1 Vector unit receives a vector request sent from a general purpose processor core through a control path, and the received vector request includes a Type of operation (Vec _ Type 0), a request number (Vec _ req _ ID 0), a source operand A, B, C (Vec _ req _ SrcA0, vec _ req _ SrcB0, vec _ req _ SrcC 0) and a Result register number (Vec _ req _ Result _ Reg _ ID 0) sent through a vector calculation request Type, a vector calculation request number, a vector calculation request source operand a, a vector calculation request source operand B, a vector calculation request source operand C, a vector calculation request Result register number port, and the signal sent by the above-described fracture is identified as valid by a vector calculation request valid bit;
1.2 Vector unit notifies a processor control unit of the general-purpose processor core through a vector request reception feedback bit port of the control path after completing reception and saving of a vector request.
This step aims at quickly completing the sending and receiving of vector requests through simplified interface content and handshake responses, unlike the strict time interval requirements of internal interface signals and response signals of a conventional processor core, the present embodiment method and interface relaxes the strict time interval requirements between interface modules, i.e. exhibits loosely coupled characteristics.
As shown in fig. 3, the sending of the memory access request to the processor data cache component of the general processor core in step 2) in this embodiment includes:
2.1A) aiming at cache Data needing to be accessed, the Type of memory access operation (Mem _ Req _ Type 0), the number of memory access request (Mem _ Req _ ID 0), the physical address of memory access request (Mem _ Req _ PA 0) and the Data of memory access request (Mem _ Req _ Wr _ Data 0) needing to be obtained are respectively sent to a processor cache Data component through the port of the memory access request Type, the number of memory access request, the physical address of memory access request and the Data of memory access request, and the signals are identified to be valid by the validity bit of the memory access request of a vector component;
2.2A) after finishing receiving and storing the access request, the processor cache data component receives the feedback bit through the access request of the vector component to inform the vector component;
2.3A) if the access operation type is Data reading, after finishing the Data reading operation, the processor cache Data component sends the acquired Data (Mem _ Rsp _ Data 0), data State (Mem _ Rsp _ State 0) and corresponding access request number (Mem _ Rsp _ ID 0) to the vector component through an access request return Data port, an access request return Data State port and an access request return Data number port, and the signals are identified to be valid by the vector component access request return Data valid bit;
2.4A) if the access operation type is write-in data, after finishing writing access write-in data into a designated position, the processor cache data component sends a data State (Mem _ Rsp _ State 0) and a corresponding access request number (Mem _ Rsp _ ID 0) to the vector component through an access request return data State and an access request return data number port, and the signals are identified to be valid by an access request return data valid bit of the vector component.
As shown in fig. 4, the sending of the register request to the processor register unit of the general-purpose processor core in step 2) in the embodiment includes:
2.1B) for the register operation Type (Reg _ Req _ Type 0), the register number (Reg _ Req _ Reg _ ID 0) and the register write Data (Reg _ Req _ Wr _ Data 0) which need register access acquisition, respectively sending the register request Type, the register request register number and the register request write Data port of the register channel to the processor register unit of the general processor core, wherein the signals are requested by the vector unit register to have valid bit identification;
2.2B) the processor register unit receives the feedback bit port notification vector unit through the vector unit register request receiving feedback bit port of the register path after completing the receiving and saving of the register request;
2.3B) if the register operation type is reading Data, after finishing the register Data reading operation, the processor register component sends the acquired Data (Reg _ Rsp _ Data 0), data State (Reg _ Rsp _ State 0) and corresponding register number (Reg _ Rsp _ ID 0) to the vector component through the register number port for requesting to return Data, register State and register number, and the signals are requested by the register of the vector component to return the valid bit identification of the Data;
2.4B) if the register operation type is write data, after the processor register unit finishes writing the register write data into the designated register, the register State (Reg _ Rsp _ State 0) and the corresponding register number (Reg _ Rsp _ ID 0) are sent to the vector unit through the register request return data State and the register request return data register number port, and the signals are requested by the vector unit register to return the valid bit identification of the data to be valid.
As shown in fig. 5, the sending of the floating point calculation request to the floating point calculation unit of the processor of the general-purpose processor core in step 2) of the present embodiment includes:
2.1C) floating point calculation Type (Fp _ Req _ Type 0), floating point calculation request number (Mem _ Req _ ID 0), source operand A, B, C (Fp _ Req _ SrcA0, fp _ Req _ SrcB0, fp _ Req _ SrcC 0) that need to be handled for floating point calculation, the floating point calculation request Type, floating point calculation request number, floating point calculation request source operand a, floating point calculation request source operand B, floating point calculation request source operand C port, respectively, through the floating point calculation path, are sent to the processor floating point calculation unit, which are identified as valid by the vector unit floating point calculation request valid bit;
2.2C) the processor floating point calculation unit receives the feedback bit notification vector unit through the floating point calculation request of the floating point calculation path after completing the receiving and the saving of the floating point calculation request;
2.3C) after finishing the corresponding floating-point calculation, the processor floating-point calculation unit sends the obtained floating-point calculation Result Data (Fp _ Result _ Data 0), the floating-point calculation Result State (Fp _ Result _ State 0) and the corresponding floating-point calculation Result number (Fp _ Result _ ID 0) to the vector unit through the floating-point calculation Result Data, the floating-point calculation Result State and the floating-point calculation Result number port of the floating-point calculation path, and the signals are identified to be valid by the floating-point calculation Result valid bit.
As shown in fig. 6, sending the page table walk request to the processor page table walk component of the general processor core in step 2) of this embodiment includes:
2.1D) aiming at a virtual address translation (Tbw _ Req _ Va 0) and an address page table translation request number (Tbw _ Req _ ID 0) of a vector address page table translation needing page table translation, respectively sending a virtual address translation request port and a virtual address translation request number port of the vector access address page table translation request of a page table translation path to a processor page table translation component, wherein the signals are identified to be valid by a valid bit of the vector address page table translation request;
2.2D) after completing the page table conversion of the corresponding vector access address, the page table conversion component sends the obtained return physical address (Tbw _ Result _ Pa 0), return physical address State (Tbw _ Result _ State 0) and corresponding return physical address response number (Tbw _ Result _ ID 0) to the vector component through a page table conversion request return physical address, a vector access address page table conversion request return physical address State and a vector access address page table conversion return physical address response number port of a page table conversion path, and the signals are effectively identified by the vector access address page table conversion request return physical address valid bit.
As shown in fig. 2, the step 3) of returning the final vector calculation result to the general processor core by the vector unit in this embodiment includes: after finishing all required operations corresponding to the vector request Vec _ Inst0, the vector component sends the final vector calculation Result State (Vec _ Result _ State 0), the vector calculation Result number (Vec _ Result _ ID 0) and the vector component State (Vec _ Status) to the processor control component through the vector calculation Result State, the vector calculation Result number and the vector component State port of the control path, and the signals are identified to be valid by the vector calculation Result valid bit.
The operations in step 2) are all performed by the vector unit according to the operation requirement of the actual vector request, and the purpose is to quickly complete the sending and receiving of the corresponding operation request and quickly receive the specified operation result or feedback through simplified interface content and handshake response to the processor data cache unit, the processor register unit, the processor floating point calculation unit and the processor page table conversion unit of the general processor core. In addition to relaxing the strict time interval requirement between the interface modules, i.e. exhibiting the loosely coupled characteristic, the implementation of some additional functional units, which are implemented in the general-purpose processor core and are actually idle for most of the time when a vector request is encountered, is reduced compared to the conventional independent vector accelerator (e.g. GPU).
As shown in fig. 7, the present embodiment further provides a cooperative interface of a general-purpose processor core and a vector unit, for applying the aforementioned cooperative method of a general-purpose processor core and a vector unit, where the cooperative interface includes:
the control path is used for realizing the transmission of the operating instruction and the state between the processor control unit and the vector unit of the general processor core;
a register path for implementing a processor register unit and a vector unit of a general purpose processor core to communicate register requests and data;
the access path is used for realizing the transmission of access requests and data between the vector component and a processor cache data component of the general processor core;
a floating point calculation path for the vector unit to communicate with a processor floating point unit of the general purpose processor core requests and data for floating point operations;
a page table translation path for the vector unit to communicate requests and data for page table translation operations with a processor page table translation unit of the general purpose processor core; the control path, the register path, the memory access path, the floating point calculation path and the page table conversion path are respectively connected and arranged between the general processor core and the vector unit.
The control path is used for the processor control unit and the vector unit to transfer operation instructions and states. Referring to fig. 2, the control path in the present embodiment includes a vector request valid bit (core _ vec _ req _ vld), a vector request type (core _ vec _ req _ type), a vector request number (core _ vec _ req _ id), a vector request source operand a (core _ vec _ req _ src), a vector request source operand B (core _ vec _ req _ src), a vector request source operand C (core _ vec _ req _ src), a vector request result register number (core _ vec _ req _ result _ reg _ id), a vector request receive feedback bit (vec _ core _ req _ ack), a vector calculation result valid bit (vec _ core _ result _ vld), a vector calculation result state (vec _ core _ result _ state), a vector calculation result number (vec _ core _ result _ state), a vector unit state (vec _ core _ result _ state), wherein: the vector request valid bit is sent by a general processor and used for representing vector request sending; the vector request type is sent by a general processor and is used for representing the operation type of the vector request; the vector request number is sent by the general processor and is used for representing the number of the vector request in the general processor; the vector request source operand A is sent by a general processor and is used for representing the data or register number of the operand A of the vector request; the vector request source operand B is sent by a general processor and is used for representing the data or register number of the operand B of the vector request; the vector request source operand C is sent by a general processor and is used for representing the data or register number of the operand C of the vector request; the vector request result register number is sent by a general processor and is used for representing the result register number to be used by the vector request; the vector request receiving feedback bit is sent by the vector component and used for representing that the vector request is successfully received by the vector component; the vector calculation result valid bit is sent by the vector component and is used for representing the result valid flag bit of the vector request; the vector calculation result state is sent by the vector component and is used for representing the state of the result data of the vector request; the vector calculation result number is sent by the vector component and used for representing the number of a request for generating vector calculation result data in the general processor; the vector component state is sent by the vector component and is used for representing the running state of the vector component;
the access path is used for the vector component and the processor cache data component to transfer access requests and data. Referring to fig. 3, the access way in the present embodiment includes a vector component access request valid bit (vec _ core _ mem _ req _ vld), a vector component access request type (vec _ core _ mem _ req _ type), a vector component access request number (vec _ core _ mem _ req _ id), a vector component access request physical address (vec _ core _ mem _ req _ pa), a vector component access request write data (vec _ core _ mem _ req _ wr _ data), a vector component access request receive feedback bit (core _ vec _ mem _ req _ ack), a vector component access request return data valid bit (core _ vec _ mem _ vsld), a vector component access request return data (core _ vec _ mem _ p _ vld), a vector component access request return data (core _ vec _ mem _ p _ data), a vector component access request return data state (core _ vec _ mem _ p _ id), a vector component access request return data state (vec _ mem _ req _ id), wherein: the vector component memory access request valid bit is sent by the vector component and is used for representing a vector component memory access request sending zone bit; the vector component memory access request type is sent by the vector component and is used for representing the operation type of the vector component memory access request; the vector component access request number is sent by the vector component and is used for representing the number of the vector component access request; the vector component access request physical address is sent by the vector component and is used for representing the physical address of the vector component access request; the vector component memory access request write data is sent by the vector component and is used for representing the memory access request write data of the vector component; the vector component access request receiving feedback bit is sent by the processor cache data component and used for representing that the vector component access request is successfully received by the cache data component; the vector component access request returns a data valid bit, which is sent by the processor cache data component and used for representing the vector component access request return data valid flag bit; the vector component access request return data is sent by the processor cache data component and is used for representing the return data of the vector component access request; the vector component memory access request returns a data state, the data state is sent by the processor cache data component and is used for representing the state of returned data of the vector component memory access request; the vector component memory access request return data number is sent by the processor cache data component and is used for representing a memory access request number corresponding to return data of the vector component memory access request;
the floating point calculation path is used for the vector unit and the floating point unit of the processor to transfer requests and data of floating point operations. Referring to fig. 4, the register lane in the present embodiment includes a vector component register request valid bit (vec _ core _ reg _ req _ vld), a vector component register request type (vec _ core _ reg _ req _ type), a vector component register request register number (vec _ core _ reg _ req _ reg _ id), a vector component register request write data (vec _ core _ reg _ req _ wr _ data), a vector component register request receive feedback bit (core _ vec _ reg _ req _ ack), a vector component register request return data valid bit (core _ vec _ reg _ rsp _ vld), a vector component register request return data (core _ c _ reg _ rsp _ data), a vector component register request return data state (core _ vec _ reg _ rsp _ state), a vector component register request return data register number (core _ c _ reg _ req _ red _ id), wherein: the vector unit register request valid bit is sent by the vector unit and is used for representing the vector unit register request sending flag bit; the vector unit register request type is sent by the vector unit and is used for representing the operation type of the vector unit register request; the vector component register request register number is sent by the vector component and is used for representing the register number corresponding to the vector component register request; the vector unit register requesting write data, issued by the vector unit, for characterizing the vector unit register requesting write data; the vector unit register request receive feedback bits, issued by the processor register unit, for characterizing that a vector unit register request is successfully received by the register unit; the vector unit register request return data valid bit is sent by the processor register unit and is used for representing the vector unit register request return data valid flag bit; the vector unit register requests return data, which is sent by the processor register unit and used for representing the return data requested by the vector unit register; the vector unit register requests the status of the returned data, issued by the processor register unit, for characterizing the status of the returned data requested by the vector unit register; the vector component register requests a return data register number, is sent by the processor register component and is used for representing the register number corresponding to the return data requested by the vector component register;
the floating point calculation path is used for the vector unit and the processor floating point unit to transfer the request and data of the floating point operation. Referring to fig. 5, the floating-point calculation path in the present embodiment includes a floating-point calculation request valid bit (vec _ core _ fp _ req _ vld), a floating-point calculation request type (vec _ core _ fp _ req _ type), a floating-point calculation request number (vec _ core _ fp _ req _ id), a floating-point calculation request source operand a (vec _ core _ fp _ req _ src), a floating-point calculation request source operand B (vec _ core _ fp _ req _ src), a floating-point calculation request source operand C (vec _ core _ fp _ req _ src), a floating-point calculation request receiving feedback bit (core _ vec _ fp _ req _ ack), a floating-point calculation result valid bit (core _ vec _ result _ vld), floating-point calculation result data (core _ vec _ fp _ result), a floating-point calculation result state (core _ result _ status), a floating-point calculation result number (vec _ result _ status): the floating point calculation request valid bit is sent by the vector unit and is used for representing a floating point calculation request sending flag bit; the floating point calculation request type is sent by the vector unit and is used for representing the operation type of the floating point calculation request; the floating point calculation request number is sent by the vector unit and is used for representing the floating point calculation request number; the floating point calculation request source operand A is sent by the vector unit and is used for representing the data of the operand A of the floating point calculation request; the floating point calculation request source operand B is sent by the vector unit and is used for representing the data of the operand B of the floating point calculation request; the floating point calculation request source operand C is sent by the vector unit and is used for representing the data of the operand C of the floating point calculation request; the floating point calculation request receiving feedback bit is sent by the processor floating point calculation unit and used for representing that the floating point calculation request is successfully received by the floating point unit; the floating point calculation result valid bit is sent by the processor floating point calculation unit and is used for representing the result data valid flag bit of the floating point calculation request; the floating point calculation result data is sent by the processor floating point calculation unit and is used for representing the result data of the floating point calculation request; the floating point calculation result state is sent by the processor floating point calculation unit and is used for representing the state parameter of the result data of the floating point calculation request; the floating point calculation result number is sent by the processor floating point calculation unit and is used for representing the floating point calculation number corresponding to the result data of the floating point calculation request;
the page table translation path is used for the vector unit and the processor page table translation unit to pass the request and data of the page table translation operation. Referring to fig. 6, the page table translation path in the present embodiment includes a vector address page table translation request valid bit (vec _ core _ tbw _ req _ vld), a vector access address page table translation request virtual address (vec _ core _ tbw _ req _ va), a vector access address page table translation request number (vec _ core _ tbw _ req _ id), a vector access address page table translation request return physical address valid bit (core _ vec _ tbw _ rsp _ vld), a vector access address translation request return physical address (core _ vec _ tbw _ rsp _ pa), a vector access address page table translation request return physical address state (core _ vec _ tbw _ rsp _ state), a vector access address translation return physical address response number (core _ c _ 35zxft _ 3534 _ id), where: the vector access address page table conversion request valid bit is sent by a vector component and used for representing a vector access address page table conversion request sending flag bit; the vector access address page table conversion request address is sent by a vector component and used for representing a virtual address of the vector access address page table conversion request; the vector access address page table conversion request type is sent by a vector component and is used for representing the vector access address page table conversion request type; the vector access address page table conversion request number is sent by a vector component and is used for representing an accelerated calculation request number corresponding to the vector access address page table conversion request; the vector access memory address page table translation request returns a physical address valid bit, which is sent by the general processor and used for representing a returned physical address valid flag bit of the vector access memory address page table translation request; the vector access address page table conversion request returns a physical address, is sent by the general processor and is used for representing the returned physical address of the vector access address page table conversion request; the vector access address page table conversion request returns a data state, is sent by the general processor and is used for representing the state of a returned physical address of the vector access address page table conversion request; the vector memory access address page table conversion return physical address number is sent by the general processor and used for representing the accelerated calculation request number corresponding to the return physical address of the vector memory access address page table conversion request.
In addition, this embodiment further provides a processor, which includes a general processor core, an interface component, and a vector component, where the general processor core is connected to the vector component through the interface component, the interface component is a cooperative interface between the general processor core and the vector component, and the general processor core may be a single core or multiple cores.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-readable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein. The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above description is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above embodiments, and all technical solutions belonging to the idea of the present invention belong to the protection scope of the present invention. It should be noted that modifications and adaptations to those skilled in the art without departing from the principles of the present invention should also be considered as within the scope of the present invention.

Claims (10)

1. A method for coordinating a general purpose processor core with a vector component, comprising:
1) The vector unit receives a vector request sent by the general-purpose processor core;
2) The vector component judges whether the vector request needs to access the cache data, if the cache data needs to be accessed, the vector component sends a memory access request to a processor data cache component of the universal processor core, so that the processor data cache component processes the memory access request and returns the data or the state of the memory access request; the vector unit judges whether the vector request needs register access, if so, the vector unit sends the register request to a processor register unit of the general processor core, so that the processor register unit processes the register request and returns the data or the state of the register request; the vector unit judges whether the vector request needs floating point calculation, and if the vector request needs floating point calculation, the vector unit sends a floating point calculation request to the processor floating point calculation unit of the general processor core, so that the processor floating point calculation unit processes the floating point calculation request and returns a floating point calculation result; the vector component judges whether the vector request needs page table translation, if the vector request needs page table translation, the vector component sends a page table translation request to a processor page table translation component of the general processor core, so that the processor page table translation component processes the page table translation request and returns a page table translation result;
3) Judging whether the operation is finished or not, and if the operation is not finished, skipping to execute the step 2); otherwise the vector unit returns the final vector computation result to the general purpose processor core.
2. The method for cooperation between a general-purpose processor core and a vector unit as claimed in claim 1, wherein step 1) comprises: 1.1 Vector unit receives vector request sent by general processor core through control path, and the received vector request includes request type, vector calculation request number, vector calculation request source operand A, vector calculation request source operand B, vector calculation request source operand C, operation type sent by vector calculation request result register number port, request number, source operand A, B, C and result register number, and the signal sent by the fracture is identified as valid by vector calculation request valid bit; 1.2 Vector unit notifies a processor control unit of the general-purpose processor core through a vector request reception feedback bit port of the control path after completing reception and saving of a vector request.
3. The method for cooperation of a general purpose processor core and a vector unit according to claim 2, wherein the step 2) of sending a memory access request to the processor data cache unit of the general purpose processor core comprises: 2.1A) aiming at cache data needing to be accessed, the access operation type, the access request number, the access request physical address and the access request write data which need to be obtained are respectively sent to a processor cache data component through the access request type, the access request number, the access request physical address and the access request write data port of an access way, and the signals are identified to be valid by an access request valid bit of a vector component; 2.2A) after finishing receiving and storing the access request, the processor cache data component receives the feedback bit through the access request of the vector component to inform the vector component; 2.3A) if the access operation type is reading data, after finishing the data reading operation, the processor cache data component sends the acquired data, data state and corresponding access request number to the vector component through an access request return data port, an access request return data state port and an access request return data number port, and the signal is identified to be valid by the vector component access request return data valid bit; 2.4A) if the access operation type is write-in data, after finishing writing access write-in data into a designated position, the processor cache data component sends a data state and a corresponding access request number to the vector component through an access request return data state and an access request return data number port, and the signal is returned that the data valid bit identifier is valid by the vector component access request.
4. The method of claim 2, wherein the step 2) of sending the register request to the processor register unit of the general purpose processor core comprises: 2.1B) aiming at the register operation type, the register number and the register write data which need to be accessed and obtained by the register, the signals are respectively sent to a processor register component of the general processor core through a register request type, a register request register number and a register request write data port of a register access, and the signals are identified to be valid by a vector component register request valid bit; 2.2B) the processor register unit receives the feedback bit port notification vector unit through the vector unit register request receiving feedback bit port of the register path after completing the receiving and saving of the register request; 2.3B) if the register operation type is read data, the processor register component sends the acquired data, the data state and the corresponding register number to the vector component through the register number port of the register request return data, the register request return data state and the register request return data after finishing the register data reading operation, and the signal is effectively identified by the vector component register request return data valid bit; 2.4B) if the register operation type is write-in data, after the processor register unit finishes writing the register write-in data into the designated register, the register state and the corresponding register number are sent to the vector unit through the register request return data state and the register request return data register number port, and the signals are identified to be valid by the vector unit register request return data valid bit.
5. The method of claim 2, wherein the step 2) of sending the floating point calculation request to the floating point calculation unit of the general purpose processor core comprises: 2.1C) aiming at the floating point calculation type, the floating point calculation request number and the source operand A, B, C which need to be processed by the floating point calculation, respectively sending the floating point calculation request type, the floating point calculation request number, the floating point calculation request source operand A, the floating point calculation request source operand B and the floating point calculation request source operand C port of a floating point calculation path to a floating point calculation unit of the processor, wherein the signals are valid through a vector unit floating point calculation request valid bit identifier; 2.2C) the processor floating point calculation unit receives the feedback bit notification vector unit through the floating point calculation request of the floating point calculation path after completing the receiving and the saving of the floating point calculation request; 2.3C) after finishing the corresponding floating point calculation, the processor floating point calculation unit sends the obtained floating point calculation result data, the floating point calculation result state and the corresponding floating point calculation result number to the vector unit through the ports of the floating point calculation result data, the floating point calculation result state and the floating point calculation result number of the floating point calculation passage, and the signals are identified to be valid by the valid bit of the floating point calculation result.
6. The method of claim 2, wherein the step 2) of sending a page table translation request to the processor page table translation unit of the general purpose processor core comprises: 2.1D) aiming at the virtual address translation request number and the address page table translation request number of the vector address page table needing page table translation, respectively sending the virtual address translation request and the address page table translation request number port of the vector access memory address page table of the page table translation passage to a processor page table translation component, wherein the signals are effectively identified by the effective bit of the vector address page table translation request; 2.2D) after completing the page table conversion of the corresponding vector access address, the page table conversion component of the processor sends the obtained return physical address, the return physical address state and the corresponding return physical address response number to the vector component through the page table conversion request return physical address of the page table conversion passage, the vector access address page table conversion request return physical address state and the vector access address page table conversion return physical address response number port, and the signal is effective by the vector access address page table conversion request return physical address valid bit identifier.
7. The method as claimed in claim 1, wherein the step 3) of returning the final vector calculation result to the general-purpose processor core by the vector unit comprises: after finishing all required operations corresponding to the vector request Vec _ Inst0, the vector component sends the final vector calculation result state, the vector calculation result number and the vector component state to the processor control component through the vector calculation result state, the vector calculation result number and the vector component state port of the control path, and the signals are identified to be valid by the vector calculation result valid bit.
8. A general purpose processor core and vector unit cooperative interface for applying the method of any one of claims 1 to 7, wherein the cooperative interface comprises: the control path is used for realizing that a processor control unit and a vector unit of the general processor core transmit operation instructions and states; a register path to implement a processor register unit and a vector unit of a general purpose processor core to communicate register requests and data; the access path is used for realizing the transmission of access requests and data between the vector component and a processor cache data component of the general processor core; a floating point computation path for the vector unit to communicate requests and data for floating point operations with a processor floating point unit of the general purpose processor core; a page table translation path for the vector unit to pass requests and data for page table translation operations with a processor page table translation unit of the general purpose processor core; the control path, the register path, the access path, the floating point calculation path and the page table conversion path are respectively connected and arranged between the general processor core and the vector unit.
9. The processor core of claim 8, wherein the processor core is further configured to: the control path includes a vector request valid bit, a vector request type, a vector request number, a vector request source operand A, a vector request source operand B, a vector request source operand C, a vector request result register number, a vector request receive feedback bit, a vector calculation result valid bit, a vector calculation result state, a vector calculation result number, a vector component state, wherein: the vector request valid bit is sent by a general processor and used for representing vector request sending; the vector request type is sent by a general processor and is used for representing the operation type of the vector request; the vector request number is sent by the general processor and is used for representing the number of the vector request in the general processor; the vector request source operand A is sent by a general processor and is used for representing the data or register number of the operand A of the vector request; the vector request source operand B is sent by a general processor and is used for representing the data or register number of the operand B of the vector request; the vector request source operand C is sent by a general processor and is used for representing the data or register number of the operand C of the vector request; the vector request result register number is sent by a general processor and is used for representing the result register number to be used by the vector request; the vector request receiving feedback bit is sent by the vector component and used for representing that the vector request is successfully received by the vector component; the vector calculation result valid bit is sent by the vector component and is used for representing the result valid flag bit of the vector request; the vector calculation result state is sent by the vector component and is used for representing the state of the result data of the vector request; the vector calculation result number is sent by the vector component and is used for representing the number of a request for generating vector calculation result data in the general processor; the vector component state is sent by the vector component and is used for representing the running state of the vector component;
the register lane includes a vector unit register request valid bit, a vector unit register request type, a vector unit register request register number, a vector unit register request write data, a vector unit register request receive feedback bit, a vector unit register request return data valid bit, a vector unit register request return data state, a vector unit register request return data register number, wherein: the vector unit register request valid bit is sent by the vector unit and is used for representing the vector unit register request sending flag bit; the vector unit register request type is sent by the vector unit and is used for representing the operation type of the vector unit register request; the vector component register request register number is sent by the vector component and is used for representing the register number corresponding to the vector component register request; the vector unit register requesting write data, issued by the vector unit, for characterizing the vector unit register requesting write data; the vector unit register request receiving feedback bits, issued by the processor register unit, are used to characterize that the vector unit register request was successfully received by the register unit; the vector unit register request return data valid bit is sent by the processor register unit and is used for representing the vector unit register request return data valid flag bit; the vector unit register request return data, issued by the processor register unit, is used to characterize the return data of the vector unit register request; the vector unit register requests the status of the returned data, issued by the processor register unit, for characterizing the status of the returned data requested by the vector unit register; the vector component register requests a return data register number, the return data register number is sent by the processor register component and is used for representing a register number corresponding to return data requested by the vector component register;
the memory access path comprises a vector component memory access request valid bit, a vector component memory access request type, a vector component memory access request number, a vector component memory access request physical address, vector component memory access request write data, a vector component memory access request receiving feedback bit, a vector component memory access request return data valid bit, vector component memory access request return data, a vector component memory access request return data state and a vector component memory access request return data number, wherein: the vector component memory access request valid bit is sent by the vector component and is used for representing a vector component memory access request sending flag bit; the vector component access request type is sent by the vector component and is used for representing the operation type of the vector component access request; the vector component access request number is sent by the vector component and is used for representing the number of the vector component access request; the vector component access request physical address is sent by the vector component and is used for representing the physical address of the vector component access request; the vector component memory access request write data is sent by the vector component and is used for representing the memory access request write data of the vector component; the vector component access request receiving feedback bit is sent by the processor cache data component and used for representing that the vector component access request is successfully received by the cache data component; the vector component access request returns a data valid bit, which is sent by the processor cache data component and used for representing the vector component access request return data valid flag bit; the vector component access request return data is sent by the processor cache data component and is used for representing the return data of the vector component access request; the vector component access request returns a data state, is sent by the processor cache data component and is used for representing the state of returned data of the vector component access request; the vector component access request return data number is sent by a processor cache data component and is used for representing the access request number corresponding to the return data of the vector component access request;
the floating point calculation path comprises a floating point calculation request valid bit, a floating point calculation request type, a floating point calculation request number, a floating point calculation request source operand A, a floating point calculation request source operand B, a floating point calculation request source operand C, a floating point calculation request receiving feedback bit, a floating point calculation result valid bit, floating point calculation result data, a floating point calculation result state and a floating point calculation result number, wherein: the floating point calculation request valid bit is sent by the vector unit and is used for representing a floating point calculation request sending flag bit; the floating point calculation request type is sent by the vector unit and is used for representing the operation type of the floating point calculation request; the floating point calculation request number is sent by the vector unit and is used for representing the floating point calculation request number; the floating point calculation request source operand A is sent by the vector unit and is used for representing the data of the operand A of the floating point calculation request; the floating point calculation request source operand B is sent by the vector unit and is used for representing the data of the operand B of the floating point calculation request; the floating point calculation request source operand C is sent by the vector unit and is used for representing the data of the operand C of the floating point calculation request; the floating point calculation request receiving feedback bit is sent by the processor floating point calculation unit and used for representing that the floating point calculation request is successfully received by the floating point unit; the floating point calculation result valid bit is sent by the processor floating point calculation unit and is used for representing the result data valid flag bit of the floating point calculation request; the floating point calculation result data is sent by the processor floating point calculation unit and is used for representing the result data of the floating point calculation request; the floating point calculation result state is sent by the processor floating point calculation unit and is used for representing the state parameter of the result data of the floating point calculation request; the floating point calculation result number is sent by the processor floating point calculation unit and is used for representing the floating point calculation number corresponding to the result data of the floating point calculation request;
the page table translation path comprises a vector address page table translation request valid bit, a vector memory access address page table translation request virtual address, a vector memory access address page table translation request number, a vector memory access address page table translation request return physical address valid bit, a vector memory access address page table translation request return physical address state, and a vector memory access address page table translation return physical address response number, wherein: the vector access address page table conversion request valid bit is sent by a vector component and used for representing a vector access address page table conversion request sending flag bit; the vector access address page table conversion request address is sent by a vector component and used for representing a virtual address of the vector access address page table conversion request; the vector access address page table conversion request type is sent by a vector component and is used for representing the vector access address page table conversion request type; the vector access address page table conversion request number is sent by a vector component and is used for representing an accelerated calculation request number corresponding to the vector access address page table conversion request; the vector access address page table conversion request returns a physical address valid bit, is sent by the general processor and is used for representing a returned physical address valid flag bit of the vector access address page table conversion request; the vector access address page table conversion request returns a physical address, is sent by the general processor and is used for representing the returned physical address of the vector access address page table conversion request; the vector access address page table conversion request returns a data state, is sent by the general processor and is used for representing the state of a returned physical address of the vector access address page table conversion request; the vector access memory address page table conversion return physical address number is sent by the general processor and used for representing the accelerated calculation request number corresponding to the return physical address of the vector access memory address page table conversion request.
10. A processor comprising a general purpose processor core, an interface unit and a vector unit, said general purpose processor core being connected to the vector unit via the interface unit, characterized in that said interface unit is a cooperative interface of a general purpose processor core and a vector unit as claimed in claim 8 or 9.
CN202111123314.7A 2021-09-24 2021-09-24 Method for coordinating general processor core and vector component, interface and processor Active CN113806250B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111123314.7A CN113806250B (en) 2021-09-24 2021-09-24 Method for coordinating general processor core and vector component, interface and processor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111123314.7A CN113806250B (en) 2021-09-24 2021-09-24 Method for coordinating general processor core and vector component, interface and processor

Publications (2)

Publication Number Publication Date
CN113806250A CN113806250A (en) 2021-12-17
CN113806250B true CN113806250B (en) 2022-10-18

Family

ID=78940374

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111123314.7A Active CN113806250B (en) 2021-09-24 2021-09-24 Method for coordinating general processor core and vector component, interface and processor

Country Status (1)

Country Link
CN (1) CN113806250B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117425879A (en) * 2021-12-27 2024-01-19 华为技术有限公司 Vector calculation-based processing method and device

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8667250B2 (en) * 2007-12-26 2014-03-04 Intel Corporation Methods, apparatus, and instructions for converting vector data
US20140310461A1 (en) * 2013-04-12 2014-10-16 Speedtrack, Inc. Optimized and parallel processing methods with application to query evaluation
CN104615386B (en) * 2015-02-12 2017-11-24 杭州中天微系统有限公司 The outer caching device of one seed nucleus
CN109614145B (en) * 2018-10-18 2021-03-09 中国科学院计算技术研究所 Processor core structure and data access method
CN109739556B (en) * 2018-12-13 2021-03-26 北京空间飞行器总体设计部 General deep learning processor based on multi-parallel cache interaction and calculation
CN110083569A (en) * 2019-04-30 2019-08-02 芯来智融半导体科技(上海)有限公司 A kind of coprocessor interface suitable for RISC-V framework
CN112817639B (en) * 2021-01-13 2022-04-08 中国民航大学 Method for accessing register file by GPU read-write unit through operand collector

Also Published As

Publication number Publication date
CN113806250A (en) 2021-12-17

Similar Documents

Publication Publication Date Title
RU2597556C2 (en) Computer cluster arrangement for executing computation tasks and method for operation thereof
US10331595B2 (en) Collaborative hardware interaction by multiple entities using a shared queue
CN104102542A (en) Network data packet processing method and device
JP5357972B2 (en) Interrupt communication technology in computer system
CN109542830B (en) Data processing system and data processing method
CN109857460A (en) Matrix convolution calculation method, interface, coprocessor and system based on RISC-V framework
Docan et al. Enabling high‐speed asynchronous data extraction and transfer using DART
CN102955737B (en) The program debugging method of heterogeneous processor system and system
CN106844048B (en) Distributed memory sharing method and system based on hardware characteristics
CN112199173B (en) Data processing method for dual-core CPU real-time operating system
US20140143524A1 (en) Information processing apparatus, information processing apparatus control method, and a computer-readable storage medium storing a control program for controlling an information processing apparatus
CN110427337B (en) Processor core based on field programmable gate array and operation method thereof
US20090006690A1 (en) Providing universal serial bus device virtualization with a schedule merge from multiple virtual machines
JP2011060279A (en) Autonomous memory architecture
CN113806250B (en) Method for coordinating general processor core and vector component, interface and processor
CN115098412B (en) Peripheral access controller, data access device and corresponding method, medium and chip
CN110488764A (en) A kind of engraving machine motion controller and its engraving equipment and method based on FPGA
CN113407352A (en) Method, processor, device and readable storage medium for processing task
CN104834504A (en) SOC dual-core structure based on master-slave cooperative work of MCU and DSP and working method thereof
CN100489830C (en) 64 bit stream processor chip system structure oriented to scientific computing
US20140149528A1 (en) Mpi communication of gpu buffers
Ji et al. Efficient intranode communication in GPU-accelerated systems
KR20130080663A (en) Method and apparatus for graphic processing using multi-threading
KR20210080009A (en) Accelerator, method for operating the same and device including the same
US20120158394A1 (en) Simulation apparatus and method for multicore system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant