WO2023010232A1 - 一种处理器及通信方法 - Google Patents

一种处理器及通信方法 Download PDF

Info

Publication number
WO2023010232A1
WO2023010232A1 PCT/CN2021/109934 CN2021109934W WO2023010232A1 WO 2023010232 A1 WO2023010232 A1 WO 2023010232A1 CN 2021109934 W CN2021109934 W CN 2021109934W WO 2023010232 A1 WO2023010232 A1 WO 2023010232A1
Authority
WO
WIPO (PCT)
Prior art keywords
function
core
input parameters
slave core
parameter
Prior art date
Application number
PCT/CN2021/109934
Other languages
English (en)
French (fr)
Inventor
林灏勋
刘虎
王徐生
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP21952142.4A priority Critical patent/EP4379564A1/en
Priority to CN202180101086.5A priority patent/CN117751356A/zh
Priority to PCT/CN2021/109934 priority patent/WO2023010232A1/zh
Publication of WO2023010232A1 publication Critical patent/WO2023010232A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/80Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors

Definitions

  • the embodiments of the present application relate to the communication field, and in particular, to a processor and a communication method.
  • convolutional neural network will require some computer vision (CV) algorithms for pre/post processing. Due to the high parallelism of the CV algorithm, vector processors are generally used.
  • CV computer vision
  • the first method is full decoupling, in which the vector processor is a completely independent core.
  • the second method is loose coupling, in which the vector processor gets tasks from the main processor and runs the tasks in parallel with the main processor until the vector processor and the main processor are synchronized.
  • the third way is tight coupling, in which the vector processor can be regarded as a part of the main processor.
  • Embodiments of the present application provide a processor and a communication method.
  • the master core in the processor is configured to control the slave core to run the second function based on the input parameters of the first function when it is determined that the input parameters of the second function are the same as the input parameters of the first function. Reduce the overhead of passing the input parameters of the second function and reduce the time to run the second function from the core.
  • the first aspect of the embodiments of the present application provides a processor.
  • the processor includes a master core and a slave core, and the processor can be applied to scenarios such as vehicle-mounted visual perception devices and mobile terminals.
  • the master core is used to transfer the input parameters of the first function to the slave core; the master core is also used to send the first information to the slave core, and the first information is used for the slave core to obtain the first function; the slave core is used to obtain the first function according to the first information, and run the first function according to the input parameters of the first function; the main core is also used to provide the first function to the Sending second information from the core, the second information is used for the slave core to obtain the second function; the slave core is also used to obtain the second function according to the second information; the master core is further configured to control the slave core to run the second function based on the input parameter of the first function when it is determined that the input parameter of the second function is the same as the input parameter of the first function.
  • the overhead of transferring the input parameters of the second function from the master core to the slave core can be reduced, and the cost of running the second function from the core can be reduced. time.
  • the input parameter of the above-mentioned second function is the same as the input parameter of the first function includes: the input parameter of the second function is the same as that of the first function
  • the input parameters of the function have the same partial parameters; the master core is also used to transfer the difference parameters to the slave core, and control the slave core to run the
  • the difference parameter is a parameter that is different from the input parameter of the first function among the input parameters of the second function.
  • the above-mentioned slave core is further configured to store the input parameters of the first function in multiple registers; the master core is further configured to provide The slave core transmits first indication information, the first indication information is used to instruct the slave core to update the parameter stored in the first register to the distinguished parameter, and the multiple registers include the first register and a second register, the second register stores the same partial parameters; the slave core is specifically configured to update the parameters stored in the first register by using the difference parameter based on the first indication information, and The second function is run using the difference parameter and the same partial parameter as input parameters of the second function.
  • the master core transmits the first instruction information to the slave core, so that the slave core can use different parameters and the same partial parameters as input parameters of the second function to run the second function. Compared with completely transmitting the input parameters of the second function, the overhead of passing the entire input parameters of the second function is reduced.
  • the above-mentioned first indication information includes multiple bits, the multiple bits correspond to the multiple registers, and the multiple bits One bit of is used to indicate whether the slave core uses the difference parameter to update a parameter stored in at least one register corresponding to the one bit.
  • the bits of the second indication information indicate whether the parameters stored in the register have changed, which can reduce repeated sending of the same input parameters transmission overhead.
  • the above-mentioned main core is further configured to, when it is determined that the input parameters of the second function are different from the input parameters of the first function, send The slave core transmits the input parameters of the second function, and controls the slave core to run the second function based on the input parameters of the second function.
  • the slave core is configured to receive input parameters of the second function passed by the master core when running the first function.
  • the slave core can receive the input parameters of the second function while running the first function, thereby improving the efficiency of running multiple functions from the core.
  • the above-mentioned processor further includes a first-in-first-out FIFO queue; the slave core is specifically configured to send the slave core to the slave core through the FIFO queue.
  • the first information and the second information are included in the above-mentioned processor.
  • the master core can send the first information and the second information to the slave core through the same queue, which can improve the efficiency of obtaining functions from the slave core.
  • the above-mentioned processor further includes a parameter buffer area; the main core is specifically configured to transfer the input parameters of the first function through the parameter buffer area ;
  • the slave core is also used to store the input parameters of the first function stored in the parameter buffer into multiple registers of the slave core; the first information includes the first function The address, the length of the first function, the first slot identifier, and the second indication information, the first slot identifier is used to indicate that the input parameters of the first function are stored in the first first slot in the parameter buffer area.
  • the second indication information is used to indicate whether to release the first slot after the slave core finishes running the first function; the input parameters of the second function and the When the input parameters of the first function are the same, the second indication information is specifically used to indicate that the parameters stored in the first slot are reserved after the slave core finishes running the first function, and the second The slot ID is all or partly the same as the first slot ID; when the input parameters of the second function are different from the input parameters of the first function, the second indication information is specifically used to indicate the The slave core releases the first slot after running the first function; the second information includes the address of the second function, the length of the second function, and a second slot identifier, and the second The slot identifier is used to indicate the slot identifier corresponding to the second slot where the input parameter of the second function is stored in the parameter buffer area.
  • the slave core when the input parameters of the first function and the input parameters of the second function are all the same, by transmitting the same slot identification as the input parameters of the first function, the slave core can obtain the Get the input parameters of the second function correctly.
  • the main core is used to indicate whether to release the slot after the function has been executed by the slave core through the second instruction information, so as to prevent the parameters in the wrong slot from being Coverage brings the problem that the slave core cannot run the function correctly.
  • the second aspect of the embodiment of the present application provides a communication method, which can be applied to a processor, and the processor includes a master core and a slave core.
  • the method includes: the master core transfers an input parameter of a first function to the slave core; the master core sends first information to the slave core, and the first information is used for the slave core to obtain the first function A function; the slave core obtains the first function according to the first information, and runs the first function according to the input parameters of the first function; the master core sends second information to the slave core , the second information is used for the slave core to obtain the second function; the slave core obtains the second function according to the second information; the master core determines the input parameters of the second function When the input parameters of the first function are the same, the slave core is controlled to run the second function based on the input parameters of the first function.
  • the input parameter of the above-mentioned second function is the same as the input parameter of the first function includes: the input parameter of the second function is the same as the input parameter of the first function
  • the input parameters of the function have the same partial parameters; the method also includes: the master core transfers a difference parameter to the slave core, and controls the slave core to run based on the difference parameter and the input parameter of the first function
  • the difference parameter is a parameter that is different from the input parameter of the first function among the input parameters of the second function.
  • the above steps further include: the master core delivering first indication information to the slave core, the first indication information being used to indicate that the slave core updating the parameter stored in the first register to the difference parameter, the multiple registers include the first register and a second register, the second register stores the same partial parameters, and the multiple registers are used for storing the input parameters of the first function; the slave core uses the difference parameters to update the parameters stored in the first register based on the first indication information, and uses the difference parameters and the same partial parameters
  • the second function is run as an input parameter to the second function.
  • the above-mentioned first indication information includes multiple bits, the multiple bits correspond to the multiple registers, and the multiple bits One bit of is used to indicate whether the slave core uses the difference parameter to update a parameter stored in at least one register corresponding to the one bit.
  • the above steps further include: when the main core determines that the input parameters of the second function are different from the input parameters of the first function, sending The slave core transmits the input parameters of the second function, and controls the slave core to run the second function based on the input parameters of the second function.
  • the slave core runs the first function, it receives input parameters of the second function passed by the master core.
  • the above-mentioned processor further includes a first-in-first-out FIFO queue;
  • the master core sending the first information to the slave core includes: the slave core passing The FIFO queue sends the first information to the slave core;
  • the master core sends the second information to the slave core, including: the slave core sends the first information to the slave core through the FIFO queue Two information.
  • the above-mentioned processor further includes a parameter buffer area; the master core transfers the input parameters of the first function to the slave core, including: the master core Passing the input parameters of the first function through the parameter buffer area; the method further includes: the slave core storing the input parameters of the first function stored in the parameter buffer area in the slave core In a plurality of registers; the first information includes the address of the first function, the length of the first function, a first slot identifier and second indication information, and the first slot identifier is used to indicate the The input parameter of the first function is stored in the slot identifier corresponding to the first slot in the parameter buffer, and the second indication information is used to indicate whether to release the input parameter after the slave core finishes running the first function.
  • the second indication information is specifically used to indicate that after the slave core finishes running the first function
  • the parameters stored in the first slot are reserved, and the second slot identifier is all or partly the same as the first slot identifier; the input parameter of the second function is the same as the input of the first function
  • the second indication information is specifically used to indicate that the first slot is released after the slave core finishes running the first function
  • the second information includes the address of the second function, the The length of the second function and the second slot ID, the second slot ID is used to indicate the slot ID corresponding to the second slot where the input parameters of the second function are stored in the parameter buffer area.
  • the main core in the processor is used to control the slave core based on the first function when the input parameters of the second function are determined to be the same as the input parameters of the first function.
  • the input parameters run the second function.
  • the overhead of transferring the input parameters of the second function from the master core to the slave core can be reduced, and the time for the slave core to run the second function can be reduced.
  • FIG. 1 is a schematic structural diagram of a communication device provided by the present application.
  • FIG. 2 is a schematic structural diagram of a processor provided by the present application.
  • FIG. 3 is a schematic flowchart of a communication method provided by the present application.
  • the embodiment of the present application provides a processor and a communication method.
  • the master core in the processor can control the slave core to run the second function based on the input parameters of the first function.
  • Second function reducing the overhead of passing the input parameters of the second function, and reducing the time to run the second function from the core.
  • FIG. 1 is a schematic diagram of a hardware structure of a communication device provided by an embodiment of the present application.
  • the communication device 100 shown in FIG. 1 includes a processor 101 , a communication port 102 and a memory 103 .
  • the processor 101, the communication port 102, and the memory 103 may be connected to each other through a bus.
  • the communication device may be a vehicle-mounted visual perception device, a terminal device, and other devices.
  • the terminal device may include a head mount display device (head mount display, HMD), and the head mount display device may be a combination of a virtual reality (virtual reality, VR) box and a terminal, a VR all-in-one machine, a personal computer (personal computer, PC ), augmented reality (augmented reality, AR) equipment, mixed reality (mixed reality, MR) equipment, etc.
  • the terminal equipment can also include cellular phone (cellular phone), smart phone (smart phone), personal digital assistant (personal digital assistant) , PDA), tablet computer, laptop computer (laptop computer), personal computer (personal computer, PC), vehicle-mounted terminal, etc., which are not limited here.
  • the processor 101 may be a general-purpose central processing unit (central processing unit, CPU), a microprocessor, an application specific integrated circuit (application specific integrated circuit, ASIC), a graphics processing unit (graphics processing unit, GPU) or one or more
  • the integrated circuit is configured to execute related programs, so as to realize the functions required to be executed by the units in the communication device in the embodiment of the present application.
  • the processor 101 may also be an integrated circuit chip, which has a signal processing capability.
  • the above-mentioned processor 101 can also be a general-purpose processor, a digital signal processor (digital signal processing, DSP), an application-specific integrated circuit (ASIC), a ready-made programmable gate array (field programmable gate array, FPGA) or other programmable logic devices , discrete gate or transistor logic devices, discrete hardware components.
  • the processor 101 includes a master core and a slave core. For a specific description of the processor 101, reference may be made to FIG. 2 , which will not be repeated here.
  • the memory 103 may be a read only memory (read only memory, ROM), a static storage device, a dynamic storage device or a random access memory (random access memory, RAM).
  • the memory 103 can store programs, and when the programs stored in the memory 103 are executed by the processor 101 , the processor 101 and the communication port 102 are used to execute the functions of the communication device 100 .
  • the communication port 102 implements communication between the communication device 100 and other devices or communication networks by using a transceiver device such as but not limited to a transceiver.
  • the communication device 100 shown in FIG. 1 only shows a memory, a processor, and a communication interface, in the specific implementation process, those skilled in the art should understand that the communication device 100 also includes necessary other devices. Meanwhile, according to specific requirements, those skilled in the art should understand that the communication device 100 may also include hardware devices for implementing other additional functions. In addition, those skilled in the art should understand that the communication device 100 may only include components necessary to implement the embodiment of the present application, and does not necessarily include all the components shown in FIG. 1 .
  • FIG. 2 shows a schematic structural diagram of a processor.
  • the processor may be the processor in the aforementioned communication device in FIG. 1 .
  • the processor 101 can also be understood as a multi-core processor.
  • the processor 101 may include a master core 1011, a slave core 1012, a first in first out (FIFO) queue 1013 and a parameter buffer 1014.
  • FIFO first in first out
  • the master core 1011 can also be understood as a master processor, and the slave core 1012 can also be understood as a vector processor.
  • the slave core 1012 obtains a task from the master core 1011 and runs the task in parallel with the master core 1011 . Or it can be understood that the master core 1011 and the slave core 1012 can perform parallel logic calculations to improve the overall calculation speed of the processor 101 .
  • the slave core 1012 includes a Pong scalar register 10121 , a Ping scalar register 10122 and an instruction buffer 10123 .
  • the master core 1011 is configured to transfer the input parameters of the function to the slave core 1012 through the parameter buffer area 1014 .
  • the master core 1011 is further configured to send function information to the slave core 1012 through the FIFO queue 1013 , and the function information is used to obtain the first function from the slave core 1012 .
  • the slave core 1012 is configured to obtain the first function according to the information of the function, and execute the function according to the input parameters of the function.
  • the FIFO queue 1013 is used for the main core 1011 to transmit at least one vector function (vector function, VF) information from the core 1012, and the information of each VF includes the address (for example: base address) of VF, the length of VF, the length of VF Slot ID and instructions.
  • the address and length of the VF are used to obtain the VF from the core, and the slot identifier of the VF is used to indicate the identifier corresponding to the slot where the input parameters of the VF are stored in the parameter buffer area 1014 .
  • the instruction information is used to indicate whether to release the slot corresponding to the input parameter of the VF after the slave core 1012 finishes running the VF.
  • the indication information is used to indicate whether to overwrite the input parameters of the first VF stored in the slot after the slave core 1012 finishes running the VF.
  • the VF may be obtained according to the address and length of the VF.
  • the slave core 1012 can also determine which slots in the parameter buffer area 1014 store the parameters as the input parameters of the VF according to the slot identification of the VF.
  • the VF may be run according to the input parameters of the VF.
  • the base address of the VF is used to point to an address storing the VF in the external memory.
  • the slave fetches the VF from external memory based on base address and length.
  • the base address of the VF is 48 bits
  • the length of the VF is 12 bits
  • the slot identifier of the VF is 4 bits.
  • the parameter buffer area 1014 includes multiple slots, and the multiple slots are used to store the input parameters of the VF.
  • the parameter buffer area 1014 is used for the master core 1011 to transmit at least one VF input parameter to the slave core 1012 .
  • the specific process may include: the main core 1011 writes the input parameters of the VF into the parameter buffer 1014 .
  • the slave core 1012 copies the parameters stored in the slot corresponding to the slot ID in the parameter buffer 1014 to the Pong scalar register 10121 .
  • the slave core 1012 When the slave core 1012 is ready to run the VF, the parameters in the Pong scalar register 10121 are copied to the Ping scalar register 10122 , and then the slave core 1012 uses the parameters in the Ping scalar register 10122 to run VF.
  • the FIFO queue 1013 and the parameter buffer area 1014 are used for the master core 1011 and the slave core 1012 to transmit VF information and VF input parameters.
  • the positions of the FIFO queue 1013 and the parameter buffer area 1014 are not limited here.
  • the FIFO queue 1013 and the parameter buffer area 1014 can be in the memory of the master core 1011, or in the memory of the slave core 1012, or It is outside the master core 1011 and the slave core 1012, but in the storage area of the processor 101 (the structure shown in FIG. 1 ).
  • the Pong scalar register 10121 and the Ping scalar register 10122 are essentially two scalar registers used for data buffering, which can achieve the purpose of continuous data transmission by using the two scalar registers at the same time, thereby increasing the data transmission rate. Since the data obtained by a single scalar register is easily overwritten during transmission and processing, this structure can always keep the data of one scalar register being utilized and the other scalar register for storing data. That is, the two scalar registers are read and written alternately.
  • the Ping scalar register 10122 can store the input parameters corresponding to the VF to be executed by the slave core 1012
  • the Pong scalar register 10121 can store the input parameters corresponding to the next VF to be executed by the slave core 1012 .
  • the Pong scalar register 10121 and the Ping scalar register 10122 respectively include multiple registers, and the multiple registers are used to identify the parameters stored in the corresponding slots of the VF slots.
  • the Pong scalar register 10121 and the Ping scalar register 10122 respectively include 64 2-byte registers (s0-s63) or 32 4-byte registers (S0-S31).
  • the instruction buffer area 10123 is used to store the instruction corresponding to the base address, and the instruction is used to execute the VF from the core 102 .
  • the base address can be used to point to the address where the VF is stored in the external memory.
  • the external storage may be the storage 103 in FIG. 1 .
  • VFs The functions of various structures in the processor 101 in various situations are described below by taking two functions as VFs as an example. It can be understood that, in this embodiment, a greater number of VFs and VF information can be transferred between the master core 1011 and the slave core 1012, and only two functions are used as examples for description here.
  • the master core 1011 is configured to transmit the input parameters of the first VF to the slave core 1012 through the parameter buffer area 1014 .
  • the master core 1011 is further configured to send first information to the slave core 1012 through the FIFO queue 1013 , where the first information is used for the slave core 1012 to obtain the first VF.
  • the slave core 1012 is configured to obtain the first VF according to the first information, and run the first VF according to the input parameters of the first VF.
  • the master core 1011 is further configured to send second information to the slave core 1012 through the FIFO queue 1013 , where the second information is used for the slave core 1012 to acquire the second VF.
  • the slave core 1012 is configured to acquire a second VF according to the second information.
  • the first information includes the base address of the first VF, the length of the first VF, the first slot ID and the second indication information, the base address of the first VF is used to point to the first VF, and the first slot ID It is used to indicate that the input parameter of the first VF is stored in the first slot corresponding to the parameter buffer area 1014, and the second indication information is used to indicate whether to release the first slot after the slave core 1012 finishes running the first VF .
  • Releasing the first slot can be understood as that the parameters in the first slot can be overwritten by subsequent parameters. Not releasing the first slot can be understood as that the parameters in the first slot may not be covered by subsequent parameters.
  • the second information includes the base address of the second VF, the length of the second VF, and the second slot identifier, the base address of the second function is used to point to the second VF, and the second slot identifier is used to indicate the input parameters of the second VF
  • the slot ID corresponding to the second slot stored in the parameter buffer area 1014 .
  • the release of the second slot can be understood as that the parameters in the second slot can be overwritten by subsequent parameters. Not releasing the second slot can be understood as that the parameters in the second slot may not be covered by subsequent parameters.
  • the second information may further include third indication information, and the third indication information is used to indicate whether to release the second slot after the slave core 1012 finishes running the second VF.
  • the functions of the master core 1011 and the functions of the slave core 1012 may have multiple situations, which are described below:
  • the input parameters of the second VF are the same as the input parameters of the first VF.
  • the input parameters of the second VF are the same as the input parameters of the first VF, which are divided into two cases, which are described below:
  • the input parameters of the second VF are all the same as those of the first VF.
  • the master core 1011 is further configured to control the slave core 1012 to run the second VF based on the input parameters of the first VF when it is determined that the input parameters of the first VF are the same as the input parameters of the second VF.
  • the second indication information is specifically used to indicate that the parameters stored in the first slot are reserved after the slave core finishes running the first VF, and the second VF The ID of the second slot is the same as that of the first slot.
  • the above-mentioned main core 1011 is used to control the slave core 1012 to run the second VF based on the input parameters of the first VF. There are many specific situations:
  • the second indication information in the first information may also be used to indicate that the input parameters of the first VF are the same as the input parameters of the second VF.
  • the second indication information in the first information may also be used to indicate that the input parameters of the second VF reuse the slot where the input parameters of the first VF are located.
  • the main core 1011 is used to transmit the first instruction information to the slave core 1012 through the parameter buffer area 1014, the first instruction information includes a plurality of bits, and the plurality of bits corresponds to a plurality of registers in the Pong scalar register 10121, Each of the multiple bits is used to indicate that the parameter stored in at least one register corresponding to each bit has not changed. Multiple registers in the Pong scalar register 10121 are used to store the input parameters of the first VF.
  • the first indication information is 32 bits, and each bit in the first indication information corresponds to two 2-byte registers in the Pong scalar register 10121, that is, the first indication information corresponds to 64 2-byte registers. Registers (s0-s63) or correspond to 32 4-byte registers (S0-S31).
  • the first indication information may use 0 or 1 to indicate that the parameter corresponding to the bit stored in the register in the Pong scalar register 10121 remains unchanged, and the following description only uses 0 to indicate that the parameter remains unchanged.
  • the first indication information is specifically 0x0000, and its corresponding binary number is (0000,0000,0000,0000), counting from right to left, counting from 0, and the 0th to 15th bits are 0, then every The parameters stored in the two s registers corresponding to one bit remain unchanged. Or the parameters stored in one S register corresponding to each bit remain unchanged.
  • the input parameters of the first VF are the same as the input parameters of the second VF, it can be understood that the parameters of the second VF and the first VF are the same, specifically, the parameters at the same position may be the same.
  • the input parameters of the second VF are partly the same as those of the first VF.
  • the second indication information in the first information is specifically used to indicate that the parameters stored in the first slot are reserved after the slave core 1012 finishes running the first VF , and part of the second slot identifier is the same as the first slot identifier, and another part of the slot identifier is different.
  • the master core 1011 is further configured to transmit the distinguishing parameter and the first indication information to the slave core 1012 through the parameter buffer area 1014 when determining that the input parameters of the second VF are partly the same as those of the first VF, and control the slave core 1012 based on Distinguishing parameters, first instruction information, and input parameters of the first VF to run the second VF.
  • the difference parameter is a parameter different from the input parameter of the first VF among the input parameters of the second VF.
  • the first indication information includes a plurality of bits corresponding to a plurality of registers in the Pong scalar register 10121, and one bit in the plurality of bits is used to indicate a parameter stored in at least one register corresponding to a bit Is there a change.
  • the first indication information is 32 bits, and each bit in the first indication information corresponds to two 2-byte registers in the Pong scalar register 10121, that is, the first indication information corresponds to 64 2-byte registers. Registers (s0-s63) or correspond to 32 4-byte registers (S0-S31).
  • the first indication information can use 0 or 1 to indicate whether the parameter stored in the register corresponding to the bit corresponding to the Pong scalar register 10121 has changed, and the following description will only be made by taking 1 to indicate that there is a change and 0 to indicate that the parameter does not change .
  • the first indication information is specifically 0x4218, and its corresponding binary number is (0100,0010,0001,1000), counting from right to left, counting from 0, the 3rd, 4th, 9th, and 14th bits are 1, Then each bit of 1 corresponds to the parameter stored in 2 s registers or 1 S register in Pong scalar register 10121, and each bit of 0 corresponds to 2 s registers or 1 S register The stored parameters are unchanged.
  • the parameters of the s6 and s7 registers corresponding to the third bit are changed, the parameters of the s8 and s9 registers corresponding to the fourth bit are changed, the parameters of the s18 and s19 registers corresponding to the ninth bit are changed, and the parameters of the s28 and s19 registers corresponding to the 14th bit are changed.
  • Other parameters stored in registers corresponding to 0 bits remain unchanged.
  • the difference parameter is used to describe the value that needs to be changed in the register corresponding to the above-mentioned bit being 1.
  • the parameters transmitted from the master core 1011 to the slave core 1012 include first indication information and distinguishing parameters.
  • the first 32 bits of this parameter are the first indication information.
  • the following bit is used to indicate the difference parameter.
  • the second 32bit is the parameter that needs to be written into the s6 and s7 registers
  • the third 32bit is the parameter that needs to be written into the s8 and s9 registers
  • the fourth 32bit is the parameter that needs to be written into the s18 and s19 registers
  • the fifth 32bit is the parameter that needs to be written into the s28 and s29 registers.
  • the input parameters of the first VF are partly the same as the input parameters of the second VF, it can be understood that the parameters of the second VF and the first VF are partly the same, specifically, the parameters at the same position may be partly the same.
  • the second case is that the input parameters of the second VF are different from the input parameters of the first VF.
  • the second indication information in the first information is specifically used to indicate that the first slot is released after the slave core 1012 finishes running the first VF. It can be understood that the parameters in this slot can be overwritten by subsequent parameters, or it can be understood that the first slot is released for the main core 1011 to fill in the input parameters of the second VF in the first slot.
  • the main core 1011 is also used to transfer the input parameters of the second VF to the slave core 1012 through the parameter buffer area 1014 when it is determined that the input parameters of the first VF are different from the input parameters of the second VF, and control the slave core 1012 based on the first VF.
  • the input parameters of the second VF run the second VF.
  • the input parameters of the first VF are 1, 2, 3, 4, 5, and the input parameters of the second VF are 5, 4, 3, 2, 1. Then it can be considered that the input parameters of the second VF and the input parameters of the first VF do not have the same parameters at the same position.
  • the input parameters of the second VF are different from the input parameters of the first VF, which can be understood as the second VF does not have the same parameters as the first VF, and can also be understood as the same position does not have the same parameters.
  • the main core 1011 writes the input parameters of the first VF into the parameter buffer 1014 , and writes the information of the first VF into the FIFO queue 1013 .
  • the information of the first VF includes a base address of the first VF, a length of the first VF, a first slot identifier of the first VF, and second indication information.
  • the slave core 1012 copies the parameters stored in the first slot corresponding to the first slot ID in the parameter buffer 1014 to the Pong scalar register 10121 according to the first slot ID.
  • the slave core 1012 copies the input parameters of the first VF stored in the Pong scalar register 10121 to the Ping scalar register.
  • the slave core 1012 can run the first VF based on the input parameters of the first VF stored in the Ping scalar register.
  • the master core 1011 can write the input parameters of the second VF in the parameter buffer area 1014 , and write the information of the second VF in the FIFO queue 1013 .
  • the information of the second VF includes a base address of the second VF, a length of the second VF, a second slot identifier of the second VF, and third indication information.
  • the slave core 1012 After the slave core 1012 finishes running the first VF, the input parameters of the second VF stored in the Pong scalar register 10121 are copied to the Ping scalar register 10122 . After the slave core 1012 obtains the second VF based on the information of the second VF, the slave core 1012 may run the second VF based on the input parameters of the second VF stored in the Ping scalar register 10122 . It can be understood that, in the case of including multiple VFs, the case of running multiple VFs from the core can be understood as a cyclic process of running the first VF and the second VF from the core, which will not be repeated here.
  • the master core 1011 is configured to control the slave core 1012 to run the second function based on the input parameters of the first function when determining that the input parameters of the second function are the same as the input parameters of the first function.
  • the overhead of passing the input parameters of the second function is reduced, and the time to run the second function from the core 1012 is reduced.
  • the master core 1011 is also used to transmit the difference parameter and the first instruction information to the slave core 1012 through the parameter buffer area 1014 when determining that the input parameters of the second function are partly the same as the input parameters of the first function, and control
  • the slave core 1012 runs the second function based on the distinguishing parameters, the first indication information, and the input parameters of the first function.
  • the storage space occupied by the transmission of the distinguishing parameters and the first indication information is smaller than the storage space occupied by the input parameters of the second function.
  • the overhead of passing the input parameters of the second function can be reduced, and the time to run the second function from the core 1012 can be reduced.
  • the storage area is divided into FIFO queue 1013 and parameter buffer 1014 , Pong scalar register 10121 and Ping scalar register 10122 . It can be implemented that while running the first function from the core 1012, the input parameters of the second function can also be obtained. That is, the two scalar registers are read and written alternately, improving the efficiency of running multiple functions from the core 1012 .
  • the processor provided in the embodiment of the present application is described above, and the communication method provided in the embodiment of the present application is described below in conjunction with the processor architecture in FIG. 2 .
  • an embodiment of the communication method provided by the embodiment of the present application includes steps 301 to 310 .
  • step 301 the master core transmits input parameters of the first function to the slave cores.
  • the main core stores the input parameters of the first function (for example, the first VF) in the parameter buffer area.
  • the slave core can acquire the parameters of the first VF from the parameter buffer.
  • the slave core reads the parameters of the first VF from the slot in the parameter buffer area, and stores the parameters of the first VF in the Pong register of the slave core.
  • the Pong scalar registers include 64 2-byte registers (s0-s63) or 32 4-byte registers (S0-S31).
  • Step 302 the master core sends first information to the slave core.
  • the master core sends the first information to the slave core through the FIFO queue.
  • the first information includes the base address of the first VF, the length of the first VF, the first slot ID and the second indication information, the base address of the first VF is used to point to the first VF, and the first slot ID It is used to indicate the slot ID corresponding to the first slot where the input parameter of the first VF is stored in the parameter buffer area, and the second indication information is used to indicate whether to release the first slot after the slave core finishes running the first VF.
  • Releasing the first slot can be understood as that the parameters in the first slot can be overwritten by subsequent parameters. Not releasing the first slot can be understood as that the parameters in the first slot may not be covered by subsequent parameters.
  • the base address of the first VF is 48 bits, the length is 12 bits, and the first slot identifier is 4 bits.
  • Step 303 acquiring a first function from the core based on the first information.
  • the first VF After receiving the first information from the core, the first VF may be acquired based on the base address and length of the first VF in the first information.
  • the slave core obtains the first VF from the memory shown in FIG. 1 based on the base address and length of the first VF in the first information.
  • Step 304 the slave core executes the first function based on the input parameters of the first function.
  • the slave core can use the first slot identifier to read the input parameters of the first VF in the input parameter buffer area, and store them in the Pong register in the slave core. Then copy the parameters in the Pong register to the Ping register, and then use the parameters in the Pong register to run the first VF.
  • the first VF is run based on the input parameters of the first VF.
  • Step 305 the master core sends second information to the slave core.
  • the master core sends the second information to the slave core through the FIFO queue.
  • the second information includes the base address of the second VF, the length of the second VF, and the second slot identifier, the base address of the second function is used to point to the second VF, and the second slot identifier is used to indicate the input parameters of the second VF
  • the second information may further include third indication information, where the third indication information is used to indicate whether to release the second slot after the slave core finishes running the second VF.
  • Step 306 acquiring a second function from the core based on the second information.
  • the second VF may be acquired based on the base address and length of the second VF in the second information.
  • the slave core obtains the second VF from the memory shown in FIG. 1 based on the base address and length of the second VF in the second information.
  • the steps executed by the master core and the steps executed by the slave core may be in many cases. If the input parameters of the first VF are the same as the input parameters of the second VF, this embodiment further includes steps 307 and 308 . If the input parameter of the first VF is different from the input parameter of the second VF, this embodiment further includes step 309 and step 310 . Described below:
  • the first case the input parameters of the second VF are the same as the input parameters of the first VF.
  • Step 307 when the master core determines that the input parameters of the second function are the same as the input parameters of the first function, control the slave cores to run the second function based on the input parameters of the first function. This step is optional.
  • the second indication information is specifically used to indicate that the parameters stored in the first slot are reserved after the slave core finishes running the first VF, and the second VF The ID of the second slot is the same as that of the first slot.
  • the input parameters of the second VF are the same as the input parameters of the first VF, which are divided into two cases, which are described below:
  • the input parameters of the second VF are all the same as those of the first VF.
  • the second indication information in the first information may also be used to indicate that the input parameters of the first VF are the same as the input parameters of the second VF.
  • the second indication information in the first information may also be used to indicate that the input parameters of the second VF reuse the slot where the input parameters of the first VF are located.
  • the master core transmits the first indication information to the slave core through the parameter buffer area.
  • the first indication information includes a plurality of bits, and the plurality of bits corresponds to a plurality of registers in the Pong scalar register. Each of the plurality of bits corresponds to a plurality of registers in the Pong scalar register. One bit is used to indicate that the parameter stored in at least one register corresponding to each bit has not changed. Multiple registers in the Pong scalar registers are used to store the input parameters of the first VF.
  • the first indication information is 32 bits, and each bit in the first indication information corresponds to two 2-byte registers in the Pong scalar register, that is, the first indication information corresponds to 64 2-byte registers (s0-s63) or correspond to 32 4-byte registers (S0-S31).
  • the first indication information may use 0 or 1 to indicate that the parameter stored in the register corresponding to the bit in the Pong scalar register remains unchanged.
  • the following description only uses 0 to indicate that the parameter does not change as an example.
  • the first indication information is specifically 0x0000, and its corresponding binary number is (0000,0000,0000,0000), counting from right to left, counting from 0, and the 0th to 15th bits are 0, then every The parameters stored in the two s registers corresponding to one bit remain unchanged. Or the parameters stored in one S register corresponding to each bit remain unchanged.
  • the input parameters of the first VF are the same as the input parameters of the second VF, it can be understood that the parameters of the second VF and the first VF are the same, specifically, the parameters at the same position may be the same.
  • the input parameters of the second VF are partly the same as those of the first VF.
  • the second indication information in the first information is specifically used to indicate that the parameters stored in the first slot are reserved after the slave core finishes running the first VF, For subsequent reuse, the input parameters of the first function can be accurately obtained.
  • part of the second slot identifier is the same as the first slot identifier, and another part of the slot identifier is different.
  • the main core determines that the input parameters of the second VF are partly the same as the input parameters of the first function, it transmits the difference parameter and the first indication information to the slave core through the parameter buffer area, and controls the slave core based on the difference parameter, the first indication information and The input parameters of the first function run the second VF.
  • the difference parameter is a parameter different from the input parameter of the first VF among the input parameters of the first VF.
  • the first indication information includes a plurality of bits corresponding to a plurality of registers in the Pong scalar register, and one bit in the plurality of bits is used to indicate whether the parameter stored in at least one register corresponding to one bit is changes happened.
  • the first indication information is 32 bits, and each bit in the first indication information corresponds to two 2-byte registers in the Pong scalar register, that is, the first indication information corresponds to 64 2-byte registers (s0-s63) or correspond to 32 4-byte registers (S0-S31).
  • the first indication information may use 0 or 1 to indicate whether the parameter stored in the register corresponding to the bit in the Pong scalar register has changed.
  • the following description only uses 1 to indicate that there is a change and 0 to indicate that the parameter does not change.
  • the first indication information is specifically 0x4218, and its corresponding binary number is (0100,0010,0001,1000), counting from right to left, counting from 0, the 3rd, 4th, 9th, and 14th bits are 1, Then each bit of 1 corresponds to the parameter stored in 2 s registers or 1 S register in the Pong scalar register, and each bit of 0 corresponds to 2 s registers or 1 S register storage parameters remain unchanged.
  • the parameters of the s6 and s7 registers corresponding to the third bit are changed, the parameters of the s8 and s9 registers corresponding to the fourth bit are changed, the parameters of the s18 and s19 registers corresponding to the ninth bit are changed, and the parameters of the s28 and s19 registers corresponding to the 14th bit are changed.
  • Other parameters stored in registers corresponding to 0 bits remain unchanged.
  • the difference parameter is used to describe the value that needs to be changed in the register corresponding to the above-mentioned bit being 1.
  • the parameters transferred from the master core to the slave core include first indication information and distinguishing parameters.
  • the first 32 bits of this parameter are the first indication information.
  • the second 32bit is the parameter that needs to be written into the s6 and s7 registers
  • the third 32bit is the parameter that needs to be written into the s8 and s9 registers
  • the fourth 32bit is the parameter that needs to be written into the s18 and s19 registers
  • the fifth 32bit is the parameter that needs to be written into the s28 and s29 registers.
  • the input parameters of the first VF are partly the same as the input parameters of the second VF, it can be understood that the parameters of the second VF and the first VF are partly the same, specifically, the parameters at the same position may be partly the same.
  • Step 308 the slave core executes the second function based on the input parameters of the first function. This step is optional.
  • the second VF may be run using the input parameters of the first VF according to the second instruction information.
  • the second indication information is used to indicate that the input parameters of the first VF are the same as the input parameters of the second VF, or indicate that the input parameters of the second VF reuse the slot where the input parameters of the first VF are located.
  • the second case the input parameters of the second VF are different from the input parameters of the first VF.
  • Step 309 when the main core determines that the input parameters of the second function are different from the input parameters of the first function, transfer the input parameters of the second function. This step is optional.
  • the second indication information in the first information is specifically used to indicate that the first slot is released after the slave core finishes running the first VF, and the release slot can be It is understood that the parameters in this slot can be overwritten by subsequent parameters, or it is understood that the first slot is released for the main core to fill in the input parameters of the second VF in the first slot.
  • the master core determines that the input parameters of the first VF are different from the input parameters of the second VF, it transmits the input parameters of the second VF to the slave core through the parameter buffer area.
  • the input parameters of the first VF are 1, 2, 3, 4, 5, and the input parameters of the second VF are 5, 4, 3, 2, 1. Then it can be considered that the input parameters of the second VF and the input parameters of the first VF do not have the same parameters at the same position.
  • the input parameters of the second VF are different from the input parameters of the first VF, which can be understood as the second VF does not have the same parameters as the first VF, and can also be understood as the same position does not have the same parameters.
  • Step 310 the slave core executes the second function based on the input parameters of the second function. This step is optional.
  • the input parameter of the second VF stored in the parameter buffer is stored in the Pong register of the slave core.
  • the slave core runs the second VF using the input parameters of the first VF in the Pong register, copy the input parameters of the second VF stored in the Pong register to the Ping register, and use the input parameters of the second VF stored in the Ping register. Input parameters to run the second VF.
  • the master core can write the input parameters of the second VF in the parameter buffer area, and write the second information in the FIFO queue. After determining the new second information in the FIFO queue from the core, according to the second slot identifier in the second information, copy the parameter stored in the second slot corresponding to the second slot identifier in the parameter buffer to Pong scalar register. After the slave core finishes running the first VF, copy the input parameters of the second VF stored in the Pong scalar register to the Ping scalar register. After the slave core acquires the second VF based on the second information, the slave core may run the second VF based on the input parameters of the second VF stored in the Ping scalar register. It can be understood that, in the case of including multiple VFs, the case of running multiple VFs from the core can be understood as a cyclic process of running the first VF and the second VF from the core, which will not be repeated here.
  • the input parameters of the first VF are the same as the input parameters of the second VF, then this embodiment may include steps 301 to 308 .
  • this embodiment may include steps 301 to 306 , steps 309 and 310 .
  • this embodiment may include steps 301 to 310.
  • the master core determines that the input parameters of the second function are the same as the input parameters of the first function, it controls the slave core to run the second function based on the input parameters of the first function. Reduce the overhead of passing the input parameters of the second function and reduce the time to run the second function from the core.
  • the main core determines that the input parameters of the second function are partly the same as the input parameters of the first function, it transmits the difference parameter and the first indication information to the slave core through the parameter buffer area, and controls the slave core based on the difference parameter, the second An instruction message and the input parameters of the first function run the second function.
  • the storage space occupied by the transmission of the distinguishing parameters and the first indication information is smaller than the storage space occupied by the input parameters of the second function.
  • the overhead of passing the input parameters of the second function can be reduced, and the time to run the second function from the core can be reduced.
  • a Pong scalar register and a Ping scalar register are obtained while running the first function from the core. It can be realized that while running the first function from the core, the input parameters of the second function can also be obtained. That is, two scalar registers are read and written alternately, improving the efficiency of running multiple functions from the core.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Theoretical Computer Science (AREA)
  • Computing Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Advance Control (AREA)
  • Communication Control (AREA)

Abstract

本申请实施例公开了一种处理器及通信方法。该处理器包括主核与从核。主核,用于向从核传递第一函数的输入参数;主核,还用于向从核发送第一信息,第一信息用于从核获取第一函数;从核,用于根据第一信息获取第一函数,并且根据第一函数的输入参数运行第一函数;主核,还用于向从核发送第二信息,第二信息用于从核获取第二函数;从核,还用于根据第二信息获取第二函数;主核,还用于在确定第二函数的输入参数与第一函数的输入参数相同时,控制从核基于第一函数的输入参数运行第二函数。在第二函数的输入参数与第一函数的输入参数相同的情况下,可以减少主核向从核传递第二函数的输入参数的开销,并减少从核运行第二函数的时间。

Description

一种处理器及通信方法 技术领域
本申请实施例涉及通信领域,尤其涉及一种处理器及通信方法。
背景技术
近些年,随着卷积神经网络的不俗表现,例如自动驾驶这样的新应用开始备受关注。卷积神经网络会需要一些机器视觉(computer vision,CV)算法做前/后处理,由于CV算法的高并行度,一般都会使用到矢量处理器。
目前,矢量处理器与主处理器之间有几种连接方式。第一种方式是全解耦,该种方式下的矢量处理器是一个完全独立的核。第二种方式是松耦合,该种方式下矢量处理器从主处理器中获得任务,并与主处理器并行运行任务,直到矢量处理器与主处理器达到同步。第三种方式是紧耦合,该种方式下的矢量处理器可以认为是主处理器中的一部分。
在矢量处理器与主处理器之间采用松耦合的连接方式的情况下,如何提升主处理器与矢量处理器传输数据的效率是亟待解决的技术问题。
发明内容
本申请实施例提供了一种处理器及通信方法。该处理器中的主核,用于在确定第二函数的输入参数与第一函数的输入参数相同时,控制从核基于第一函数的输入参数运行第二函数。减少传递第二函数的输入参数的开销,并减少从核运行第二函数的时间。
本申请实施例第一方面提供了一种处理器。该处理器包括主核与从核,该处理器可以应用于车载视觉感知设备、手机终端等场景。主核,用于向从核传递第一函数的输入参数;所述主核,还用于向所述从核发送第一信息,所述第一信息用于所述从核获取所述第一函数;所述从核,用于根据所述第一信息获取所述第一函数,并且根据所述第一函数的输入参数运行所述第一函数;所述主核,还用于向所述从核发送第二信息,所述第二信息用于所述从核获取所述第二函数;所述从核,还用于根据所述第二信息获取所述第二函数;所述主核,还用于在确定所述第二函数的输入参数与所述第一函数的输入参数相同时,控制所述从核基于所述第一函数的输入参数运行所述第二函数。
本实施例中,在第二函数的输入参数与第一函数的输入参数相同的情况下,可以减少主核向从核传递第二函数的输入参数的开销,并减少从核运行第二函数的时间。
可选地,在第一方面的一种可能的实现方式中,上述的第二函数的输入参数与所述第一函数的输入参数相同包括:所述第二函数的输入参数与所述第一函数的输入参数存在相同的部分参数;所述主核,还用于向所述从核传递区别参数,并控制所述从核基于所述区别参数和所述第一函数的输入参数运行所述第二函数,所述区别参数为所述第二函数的输入参数中与所述第一函数的输入参数不同的参数。
该种可能实现的方式中,在第一函数的输入参数与第二函数的输入参数部分相同时,通过传输第二指示信息与区别参数,相较于完整传输第二函数的输入参数,减少传递第二函数输入参数的开销。
可选地,在第一方面的一种可能的实现方式中,上述的从核,还用于将所述第一函数的输 入参数存储在多个寄存器中;所述主核,还用于向所述从核传递第一指示信息,所述第一指示信息用于指示所述从核将第一寄存器中存储的参数更新为所述区别参数,所述多个寄存器包括所述第一寄存器与第二寄存器,所述第二寄存器存储有所述相同的部分参数;所述从核,具体用于基于所述第一指示信息使用所述区别参数更新所述第一寄存器中存储的参数,并将所述区别参数与所述相同的部分参数作为所述第二函数的输入参数运行所述第二函数。
该种可能实现的方式中,主核通过向从核传递第一指示信息的方式,使得从核可以使用区别参数与相同的部分参数作为所述第二函数的输入参数运行所述第二函数。相较于完整传输第二函数的输入参数,减少传递整个第二函数输入参数的开销。
可选地,在第一方面的一种可能的实现方式中,上述的第一指示信息包括多个比特位,所述多个比特位与所述多个寄存器对应,所述多个比特位中的一个比特位用于指示所述从核是否使用所述区别参数更新与所述一个比特位对应的至少一个寄存器中存储的参数。
该种可能实现的方式中,在第一函数的输入参数与第二函数的输入参数部分相同时,通过第二指示信息的比特位表示寄存器存储的参数是否有变化,可以减少重复发送相同输入参数带来的传输开销。
可选地,在第一方面的一种可能的实现方式中,上述的主核,还用于在确定所述第二函数的输入参数与所述第一函数的输入参数不相同时,向所述从核传递所述第二函数的输入参数,并控制所述从核基于所述第二函数的输入参数运行所述第二函数。可选地,从核,用于在运行第一函数时,接收主核传递的第二函数的输入参数。
该种可能实现的方式中,从核可以边运行第一函数,边接收第二函数的输入参数,进而提升从核运行多个函数的效率。
可选地,在第一方面的一种可能的实现方式中,上述的处理器还包括先进先出FIFO队列;所述从核,具体用于通过所述FIFO队列向所述从核发送所述第一信息与所述第二信息。
该种可能实现的方式中,主核可以通过同一队列向从核发送第一信息与第二信息,可以提升从核获取函数的效率。
可选地,在第一方面的一种可能的实现方式中,上述的处理器还包括参数缓存区;所述主核,具体用于通过所述参数缓存区传递所述第一函数的输入参数;所述从核,还用于将所述参数缓存区中存储的所述第一函数的输入参数存储到所述从核的多个寄存器中;所述第一信息包括所述第一函数的地址、所述第一函数的长度、第一槽位标识以及第二指示信息,所述第一槽位标识用于指示所述第一函数的输入参数存储在所述参数缓存区中的第一槽位对应的槽位标识,所述第二指示信息用于指示在所述从核运行完所述第一函数之后是否释放所述第一槽位;在所述第二函数的输入参数与所述第一函数的输入参数相同时,所述第二指示信息具体用于指示在所述从核运行完所述第一函数之后保留所述第一槽位中存储的参数,且所述第二槽位标识与所述第一槽位标识全部或部分相同;在所述第二函数的输入参数与所述第一函数的输入参数不同时,所述第二指示信息具体用于指示在所述从核运行完所述第一函数之后释放所述第一槽位;所述第二信息包括所述第二函数的地址、所述第二函数的长度以及第二槽位标识,所述第二槽位标识用于指示所述第二函数的输入参数存储在所述参数缓存区中的第二槽位对应的槽位标识。
该种可能实现的方式中,在第一函数的输入参数与第二函数的输入参数全部相同时,通过 传输与第一函数的输入参数相同的槽位标识,使得从核可以从参数缓存区中正确获取第二函数的输入参数。另外,主核,用于通过第二指示信息指示从核运行完函数是否释放槽位,进而在第一函数的输入参数与第二函数的输入参数相同或部分相同时避免错误槽位中参数被覆盖带来从核不能正确运行函数的问题。
本申请实施例第二方面提供了一种通信方法,可以应用于处理器,该处理器包括主核与从核。该方法包括:所述主核向所述从核传递第一函数的输入参数;所述主核向所述从核发送第一信息,所述第一信息用于所述从核获取所述第一函数;所述从核根据所述第一信息获取所述第一函数,并且根据所述第一函数的输入参数运行所述第一函数;所述主核向所述从核发送第二信息,所述第二信息用于所述从核获取所述第二函数;所述从核根据所述第二信息获取所述第二函数;所述主核在确定所述第二函数的输入参数与所述第一函数的输入参数相同时,控制所述从核基于所述第一函数的输入参数运行所述第二函数。
可选地,在第二方面的一种可能的实现方式中,上述的第二函数的输入参数与所述第一函数的输入参数相同包括:所述第二函数的输入参数与所述第一函数的输入参数存在相同的部分参数;所述方法还包括:所述主核向所述从核传递区别参数,并控制所述从核基于所述区别参数和所述第一函数的输入参数运行所述第二函数,所述区别参数为所述第二函数的输入参数中与所述第一函数的输入参数不同的参数。
可选地,在第二方面的一种可能的实现方式中,上述步骤还包括:所述主核向所述从核传递第一指示信息,所述第一指示信息用于指示所述从核将第一寄存器中存储的参数更新为所述区别参数,多个寄存器包括所述第一寄存器与第二寄存器,所述第二寄存器存储有所述相同的部分参数,所述多个寄存器用于存储所述第一函数的输入参数;所述从核基于所述第一指示信息使用所述区别参数更新所述第一寄存器中存储的参数,并将所述区别参数与所述相同的部分参数作为所述第二函数的输入参数运行所述第二函数。
可选地,在第二方面的一种可能的实现方式中,上述的第一指示信息包括多个比特位,所述多个比特位与所述多个寄存器对应,所述多个比特位中的一个比特位用于指示所述从核是否使用所述区别参数更新与所述一个比特位对应的至少一个寄存器中存储的参数。
可选地,在第二方面的一种可能的实现方式中,上述步骤还包括:所述主核在确定所述第二函数的输入参数与所述第一函数的输入参数不相同时,向所述从核传递所述第二函数的输入参数,并控制所述从核基于所述第二函数的输入参数运行所述第二函数。可选地,从核运行第一函数时,接收主核传递的第二函数的输入参数。
可选地,在第二方面的一种可能的实现方式中,上述的处理器还包括先进先出FIFO队列;所述主核向所述从核发送第一信息,包括:所述从核通过所述FIFO队列向所述从核发送所述第一信息;所述主核向所述从核发送第二信息,包括:所述从核通过所述FIFO队列向所述从核发送所述第二信息。
可选地,在第二方面的一种可能的实现方式中,上述的处理器还包括参数缓存区;所述主核向所述从核传递第一函数的输入参数,包括:所述主核通过所述参数缓存区传递所述第一函数的输入参数;所述方法还包括:所述从核将所述参数缓存区中存储的所述第一函数的输入参数存储到所述从核的多个寄存器中;所述第一信息包括所述第一函数的地址、所述第一函数的长度、第一槽位标识以及第二指示信息,所述第一槽位标识用于指示所述第一函数的输入参数 存储在所述参数缓存区中的第一槽位对应的槽位标识,所述第二指示信息用于指示在所述从核运行完所述第一函数之后是否释放所述第一槽位;在所述第二函数的输入参数与所述第一函数的输入参数相同时,所述第二指示信息具体用于指示在所述从核运行完所述第一函数之后保留所述第一槽位中存储的参数,且所述第二槽位标识与所述第一槽位标识全部或部分相同;在所述第二函数的输入参数与所述第一函数的输入参数不同时,所述第二指示信息具体用于指示在所述从核运行完所述第一函数之后释放所述第一槽位;所述第二信息包括所述第二函数的地址、所述第二函数的长度以及第二槽位标识,所述第二槽位标识用于指示所述第二函数的输入参数存储在所述参数缓存区中的第二槽位对应的槽位标识。
其中,第二方面或者其中任一种可能实现方式所带来的技术效果可参见第一方面或第一方面不同可能实现方式所带来的技术效果,此处不再赘述。
从以上技术方案可以看出,本申请实施例具有以下优点:处理器中的主核,用于在确定第二函数的输入参数与第一函数的输入参数相同时,控制从核基于第一函数的输入参数运行第二函数。在第二函数的输入参数与第一函数的输入参数相同的情况下,可以减少主核向从核传递第二函数的输入参数的开销,并减少从核运行第二函数的时间。
附图说明
图1为本申请提供的一种通信设备的结构示意图;
图2为本申请提供的一种处理器的结构示意图;
图3为本申请提供的通信方法的一个流程示意图。
具体实施方式
本申请实施例提供了一种处理器及通信方法,在第一函数的输入参数与第二函数的输入参数相同时,处理器中的主核可以控制从核基于第一函数的输入参数运行第二函数,减少传递第二函数的输入参数的开销,并减少从核运行第二函数的时间。
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获取的所有其他实施例,都属于本发明保护的范围。
图1是本申请实施例提供的通信设备的硬件结构示意图。图1所示的通信设备100包括处理器101、通信端口102以及存储器103。其中,处理器101、通信端口102、存储器103可以通过总线实现彼此之间的通信连接。
该通信设备可以是车载视觉感知设备、终端设备等设备。其中,终端设备可以包括头戴显示设备(head mount display,HMD)、该头戴显示设备可以是虚拟现实(virtual reality,VR)盒子与终端的组合,VR一体机,个人计算机(personal computer,PC),增强现实(augmented reality,AR)设备,混合现实(mixed reality,MR)设备等,该终端设备还可以包括蜂窝电话(cellular phone)、智能电话(smart phone)、个人数字助理(personal digital assistant,PDA)、平板型电脑、膝上型电脑(laptop computer)、个人电脑(personal computer,PC)、车载终端等,具体此处不做限定。
处理器101可以采用通用的中央处理器(central processing unit,CPU),微处理器,应用专用集成电路(application specific integrated circuit,ASIC),图形处理器(graphics processing unit,GPU)或者一个或多个集成电路,用于执行相关程序,以实现本申请实施例的通信设备中的单元所需执行的功能。
处理器101还可以是一种集成电路芯片,具有信号的处理能力。上述的处理器101还可以是通用处理器、数字信号处理器(digital signal processing,DSP)、专用集成电路(ASIC)、现成可编程门阵列(field programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。该处理器101包括主核与从核,处理器101的具体描述具体可以参考图2,此处不再赘述。
存储器103可以是只读存储器(read only memory,ROM),静态存储设备,动态存储设备或者随机存取存储器(random access memory,RAM)。存储器103可以存储程序,当存储器103中存储的程序被处理器101执行时,处理器101和通信端口102用于执行通信设备100的功能。
通信端口102使用例如但不限于收发器一类的收发装置,来实现通信设备100与其他设备或通信网络之间的通信。
应注意,尽管图1所示的通信设备100仅仅示出了存储器、处理器、通信接口,但是在具体实现过程中,本领域的技术人员应当理解,通信设备100还包括实现正常运行所必须的其他器件。同时,根据具体需要,本领域的技术人员应当理解,通信设备100还可包括实现其他附加功能的硬件器件。此外,本领域的技术人员应当理解,通信设备100也可仅仅包括实现本申请实施例所必须的器件,而不必包括图1中所示的全部器件。
图2给出了一种处理器的结构示意图。该处理器可以是前述图1通信设备中的处理器。该处理器101也可以理解为是多核处理器。该处理器101可以包括主核1011、从核1012、先进先出(first in first out,FIFO)队列1013以及参数缓存区1014。
其中,主核1011也可以理解为是主处理器,从核1012也可以理解为是矢量处理器。从核1012从主核1011中获得任务,并与主核1011并行运行任务。或者理解为,主核1011与从核1012可以进行并行逻辑计算,以提高处理器101整体的运算速度。另外,从核1012中包括Pong标量寄存器10121、Ping标量寄存器10122以及指令缓存区10123。
主核1011,用于通过参数缓存区1014向从核1012传递函数的输入参数。
主核1011,还用于通过FIFO队列1013向从核1012发送函数的信息,该函数的信息用于从核1012获取第一函数。
从核1012,用于根据函数的信息获取第一函数,并且根据函数的输入参数运行函数。
FIFO队列1013,用于主核1011向从核1012传输至少一个矢量函数(vector function,VF)的信息,每个VF的信息中包括VF的地址(例如:基地址)、VF的长度、VF的槽位标识以及指示信息。其中,VF的地址与长度用于从核获取VF,VF的槽位标识用于指示VF的输入参数存储在参数缓存区1014中的槽位对应的标识。指示信息用于指示从核1012运行完VF后,是否释放该VF的输入参数对应的槽位。或者理解为,该指示信息用于指示从核1012运行完VF后,是否覆盖槽位中存储的第一VF的输入参数。从核1012获取VF的信息之后,可以根据VF的地址与长度获取VF。从核1012还可以根据VF的槽位标识确定参数缓存区1014中哪些槽 位中存储的参数为VF的输入参数。从核1012获取VF与VF的输入参数之后,可以根据VF的输入参数运行VF。
可选地,VF的基地址用于指向外部存储器中存储VF的地址。从核基于基地址与长度从外部存储器中获取VF。
示例性的,VF的基地址为48比特,VF的长度为12比特,VF的槽位标识为4比特。
参数缓存区1014包括多个槽位,多个槽位用于存储VF的输入参数。或者理解为,参数缓存区1014,用于主核1011向从核1012传递至少一个VF的输入参数。具体过程可以包括:主核1011将VF的输入参数写入到参数缓存区1014中。从核1012根据VF的槽位标识将参数缓存区1014中该槽位标识对应的槽位中存储的参数拷贝到Pong标量寄存器10121中。当从核1012准备运行VF时,将Pong标量寄存器10121中的参数拷贝到Ping标量寄存器10122中,进而从核1012使用Ping标量寄存器10122中的参数运行VF。
需要说明的是,FIFO队列1013与参数缓存区1014用于主核1011与从核1012传输VF的信息以及VF的输入参数。FIFO队列1013与参数缓存区1014的位置具体此处不做限定,换句话说,FIFO队列1013与参数缓存区1014可以在主核1011的内存中,也可以在从核1012的内存中,还可以在主核1011与从核1012之外,但在处理器101的存储区域中(如图1的结构)。
另外,Pong标量寄存器10121与Ping标量寄存器10122实质是一种数据缓冲所用的两个标量寄存器,可以实现同时利用两个标量寄存器达到数据连续传输的目的,从而提高数据传输速率。由于单个标量寄存器得到的数据在传输和处理中很容易被覆盖,而这种结构能够总是保持一个标量寄存器的数据被利用,另一个标量寄存器用于存储数据。即两个标量寄存器交替地被读和被写。换句话说,Ping标量寄存器10122可以是存从核1012即将运行的VF对应的输入参数,Pong标量寄存器10121是存下一个要从核1012执行的VF对应的输入参数。
可选地,Pong标量寄存器10121与Ping标量寄存器10122分别包括多个寄存器,该多个寄存器用于村塾VF的槽位标识对应的槽位中存储的参数。
示例性的,Pong标量寄存器10121与Ping标量寄存器10122分别包括64个2字节的寄存器(s0-s63)或者32个4字节的寄存器(S0-S31)。
指令缓存区10123,用于存储基地址对应的指令,该指令用于从核102执行VF。该基地址可以用于指向外部存储器中存储VF的地址。该外部存储器可以是图1中的存储器103。
下面以两个函数是VF为例描述处理器101中的各个结构在多种情况下的功能。可以理解的是,本实施例中主核1011与从核1012之间可以传递更多数量的VF以及VF的信息,这里仅以两个函数为例进行描述。
主核1011,用于通过参数缓存区1014向从核1012传递第一VF的输入参数。主核1011,还用于通过FIFO队列1013向从核1012发送第一信息,该第一信息用于从核1012获取第一VF。
从核1012,用于根据第一信息获取第一VF,并且根据第一VF的输入参数运行第一VF。
主核1011,还用于通过FIFO队列1013向从核1012发送第二信息,该第二信息用于从核1012获取第二VF。
从核1012,用于根据第二信息获取第二VF。
可选地,第一信息包括第一VF的基地址、第一VF的长度、第一槽位标识以及第二指示信 息,第一VF的基地址用于指向第一VF,第一槽位标识用于指示第一VF的输入参数存储在参数缓存区1014中的第一槽位对应的槽位标识,第二指示信息用于指示在从核1012运行完第一VF之后是否释放第一槽位。释放第一槽位可以理解为,该第一槽位中的参数可以被后续的参数覆盖。不释放第一槽位可以理解为,该第一槽位中的参数可以不被后续的参数覆盖。
第二信息包括第二VF的基地址、第二VF的长度以及第二槽位标识,第二函数的基地址用于指向第二VF,第二槽位标识用于指示第二VF的输入参数存储在参数缓存区1014中的第二槽位对应的槽位标识。释放第二槽位可以理解为,该第二槽位中的参数可以被后续的参数覆盖。不释放第二槽位可以理解为,该第二槽位中的参数可以不被后续的参数覆盖。
可选地,第二信息还可以包括第三指示信息,第三指示信息用于指示在从核1012运行完第二VF之后是否释放第二槽位。
本实施例中,基于第一VF的输入参数与第二VF的输入参数相同或不同,主核1011的功能与从核1012的功能有多种情况,下面分别描述:
第一种,第二VF的输入参数与第一VF的输入参数相同的情况。
本申请实施例中,第二VF的输入参数与第一VF的输入参数相同分为两种情况,下面分别描述:
1、第二VF的输入参数与第一VF的输入参数全部相同。
主核1011,还用于在确定第一VF的输入参数与第二VF的输入参数相同时,控制从核1012基于第一VF的输入参数运行第二VF。
可选地,在第二VF的输入参数与第一VF的输入参数相同时,第二指示信息具体用于指示在从核运行完第一VF之后保留第一槽位中存储的参数,且第二槽位标识与第一槽位标识相同。
上述主核1011,用于控制从核1012基于第一VF的输入参数运行第二VF的具体有多种情况:
1.1、第一信息中的第二指示信息,可以还用于指示第一VF的输入参数与第二VF的输入参数相同。
1.2、第一信息中的第二指示信息,可以还用于指示第二VF的输入参数重用第一VF的输入参数所在的槽位。
1.3、主核1011,用于通过参数缓存区1014向从核1012传递第一指示信息,该第一指示信息包括多个比特位,多个比特位与Pong标量寄存器10121中的多个寄存器对应,多个比特位中的每一个比特位用于指示与每一个比特位对应的至少一个寄存器存储的参数未发生改变。Pong标量寄存器10121中的多个寄存器用于存储第一VF的输入参数。
可选地,第一指示信息为32比特,且第一指示信息中的每一个比特位对应Pong标量寄存器10121中的2个2字节的寄存器,即第一指示信息对应64个2字节的寄存器(s0-s63)或者对应32个4字节的寄存器(S0-S31)。
可选地,第一指示信息可以用0或1来表示该比特位对应Pong标量寄存器10121中的寄存器存储的参数不变,下面仅以0表示参数不变为例进行描述。
示例性的,第一指示信息具体为0x0000,其对应的二进制为(0000,0000,0000,0000),从右往左数,从0开始数,第0位至第15位为0,则每一比特位对应的2个s寄存器存储的参数不变。或者每一比特位对应的1个S寄存器存储的参数不变。
其中,第一VF的输入参数与第二VF的输入参数相同,可以理解为是第二VF与第一VF参数都相同,具体可以是相同位置上的参数都相同。
2、第二VF的输入参数与第一VF的输入参数部分相同。
在第二VF的输入参数与第一VF的输入参数部分相同时,第一信息中的第二指示信息具体用于指示在从核1012运行完第一VF之后保留第一槽位中存储的参数,且第二槽位标识与第一槽位标识中的一部分槽位标识相同,另一部分槽位标识不同。
主核1011,还用于在确定第二VF的输入参数与第一VF的输入参数部分相同时,通过参数缓存区1014向从核1012传递区别参数以及第一指示信息,并控制从核1012基于区别参数、第一指示信息以及第一VF的输入参数运行第二VF。该区别参数为第二VF的输入参数中与第一VF的输入参数不同的参数。第一指示信息包括多个比特位,多个比特位与Pong标量寄存器10121中的多个寄存器对应,多个比特位中的一个比特位用于指示与一个比特位对应的至少一个寄存器存储的参数是否发生改变。
可选地,第一指示信息为32比特,且第一指示信息中的每一个比特位对应Pong标量寄存器10121中的2个2字节的寄存器,即第一指示信息对应64个2字节的寄存器(s0-s63)或者对应32个4字节的寄存器(S0-S31)。
可选地,第一指示信息可以用0或1来表示该比特位对应Pong标量寄存器10121中的寄存器存储的参数是否发生改变,下面仅以1表示存在更改、0表示参数不变为例进行描述。
示例性的,第一指示信息具体为0x4218,其对应的二进制为(0100,0010,0001,1000),从右往左数,从0开始数,第3、4、9、14位为1,则为1的每一比特位对应Pong标量寄存器10121中的2个s寄存器或1个S寄存器存储的参数存在更改,以及为0的每一位比特位对应的2个s寄存器或1个S寄存器存储的参数不变。即第3位对应的s6、s7寄存器的参数存在更改,第4位对应的s8、s9寄存器的参数存在更改,第9位对应的s18、s19寄存器的参数存在更改,第14位对应的s28、s29寄存器的参数存在更改。其他为0比特位对应寄存器存储的参数不变。进一步的,区别参数用于描述上述比特位为1对应的寄存器需要改的值。例如:主核1011向从核1012传递的参数包括第一指示信息与区别参数。该参数的第一个32bit为第一指示信息。后面的bit用于表示区别参数。例如:第二个32bit为需要写入s6、s7寄存器的参数,第三个32bit为需要写入s8、s9寄存器的参数,第四个32bit为需要写入s18、s19寄存器的参数,第五个32bit为需要写入s28、s29寄存器的参数。
其中,第一VF的输入参数与第二VF的输入参数部分相同,可以理解为是第二VF与第一VF参数部分相同,具体可以是相同位置上的参数部分相同。
第二种,第二VF的输入参数与第一VF的输入参数不相同的情况。
在第二VF的输入参数与第一VF的输入参数不相同时,第一信息中的第二指示信息具体用于指示在从核1012运行完第一VF之后释放第一槽位,释放槽位可以理解为是该槽位中的参数可以被后续参数覆盖,或者理解为是释放第一槽位给主核1011在第一槽位中填写第二VF的输入参数。
主核1011,还用于在确定第一VF的输入参数与第二VF的输入参数不相同时,通过参数缓存区1014向从核1012传递第二VF的输入参数,并控制从核1012基于第二VF的输入参数运行第二VF。
示例性的,第一VF的输入参数为1、2、3、4、5,第二VF的输入参数为5、4、3、2、1。则可以认为第二VF的输入参数与第一VF的输入参数在相同位置上没有相同的参数。
可选地,第二VF的输入参数与第一VF的输入参数不相同,可以理解为是第二VF与第一VF没有相同的参数,还可以理解为是相同位置上没有相同的参数。
进一步的,主核1011在参数缓存区1014中写入第一VF的输入参数,在FIFO队列1013中写入第一VF的信息。该第一VF的信息包括第一VF的基地址、第一VF的长度、第一VF的第一槽位标识以及第二指示信息。从核1012根据第一槽位标识将参数缓存区1014中该第一槽位标识对应的第一槽位中存储的参数拷贝到Pong标量寄存器10121。在从核1012准备运行第一VF时,从核1012将Pong标量寄存器10121中存储的第一VF的输入参数拷贝到Ping标量寄存器。从核1012基于第一VF的信息获取第一VF(可参考前面的描述)后,从核1012可以基于Ping标量寄存器中存储的第一VF的输入参数运行第一VF。
在从核1012运行第一VF时,主核1011可以在参数缓存区1014中写入第二VF的输入参数,在FIFO队列1013中写入第二VF的信息。该第二VF的信息包括第二VF的基地址、第二VF的长度、第二VF的第二槽位标识以及第三指示信息。从核1012确定FIFO队列1013中新增的第二VF的信息之后,根据第二VF的信息中的槽位标识,将参数缓存区1014中该槽位标识对应的槽位中存储的参数拷贝到Pong标量寄存器10121。在从核1012运行完第一VF之后,将Pong标量寄存器10121中存储的第二VF的输入参数拷贝到Ping标量寄存器10122。从核1012基于第二VF的信息获取第二VF后,从核1012可以基于Ping标量寄存器10122中存储的第二VF的输入参数运行第二VF。可以理解的是,在包括多个VF的情况下,从核运行多个VF的情况可以理解为是从核运行第一VF与第二VF的循环过程,此处不再赘述。
本实施例中,一方面,主核1011,用于在确定第二函数的输入参数与第一函数的输入参数相同时,控制从核1012基于第一函数的输入参数运行第二函数。减少传递第二函数的输入参数的开销,并减少从核1012运行第二函数的时间。另一方面,主核1011,还用于在确定第二函数的输入参数与第一函数的输入参数部分相同时,通过参数缓存区1014向从核1012传递区别参数以及第一指示信息,并控制从核1012基于区别参数、第一指示信息以及第一函数的输入参数运行第二函数。相对于现有技术中传输全部第二函数的输入参数,由于传输区别参数与第一指示信息所占的存储空间小于第二函数的输入参数所占的存储空间。可以减少传递第二函数的输入参数的开销,并减少从核1012运行第二函数的时间。另一方面,通过将存储区域分为FIFO队列1013与参数缓存区1014、Pong标量寄存器10121与Ping标量寄存器10122的方式。可以实现从核1012运行第一函数的同时,还可以获取第二函数的输入参数。即两个标量寄存器交替地被读和被写,提升从核1012运行多个函数的效率。
上面对本申请实施例提供的处理器进行了描述,下面结合图2的处理器架构对本申请实施例提供的通信方法进行描述。
请参阅图3,本申请实施例提供的通信方法的一个实施例包括步骤301至步骤310。
步骤301,主核向从核传递第一函数的输入参数。
主核将第一函数(例如第一VF)的输入参数存储在参数缓存区中。从核可以从参数缓存区中获取第一VF的参数。
可选地,该参数缓存区中有多个槽位,多个槽位用于主核写入第一VF的参数。相应的, 从核在参数缓存区中的槽位中读取第一VF的参数,并将第一VF的参数存储在从核的Pong寄存器中。
示例性的,Pong标量寄存器包括64个2字节的寄存器(s0-s63)或者32个4字节的寄存器(S0-S31)。
步骤302,主核向从核发送第一信息。
主核通过FIFO队列向从核发送第一信息。
可选地,第一信息包括第一VF的基地址、第一VF的长度、第一槽位标识以及第二指示信息,第一VF的基地址用于指向第一VF,第一槽位标识用于指示第一VF的输入参数存储在参数缓存区中的第一槽位对应的槽位标识,第二指示信息用于指示在从核运行完第一VF之后是否释放第一槽位。释放第一槽位可以理解为,该第一槽位中的参数可以被后续的参数覆盖。不释放第一槽位可以理解为,该第一槽位中的参数可以不被后续的参数覆盖。
示例性的,第一VF的基地址为48比特,长度为12比特,第一槽位标识为4比特。
步骤303,从核基于第一信息获取第一函数。
从核接收第一信息之后,可以基于第一信息中的第一VF的基地址与长度获取第一VF。
可选地,从核基于第一信息中的第一VF的基地址与长度从如图1所示的存储器中获取第一VF。
步骤304,从核基于第一函数的输入参数运行第一函数。
从核可以使用第一槽位标识在入参缓存区中读取第一VF的输入参数,并存储在从核中的Pong寄存器中。再将Pong寄存器中的参数拷贝到Ping寄存器中,进而使用Pong寄存器中的参数运行第一VF。
从核获取第一VF的输入参数和第一VF之后,基于第一VF的输入参数运行第一VF。
步骤305,主核向从核发送第二信息。
主核通过FIFO队列向从核发送第二信息。
第二信息包括第二VF的基地址、第二VF的长度以及第二槽位标识,第二函数的基地址用于指向第二VF,第二槽位标识用于指示第二VF的输入参数存储在参数缓存区中的第二槽位对应的槽位标识。释放第二槽位可以理解为,该第二槽位中的参数可以被后续的参数覆盖。不释放第二槽位可以理解为,该第二槽位中的参数可以不被后续的参数覆盖。
可选地,第二信息还可以包括第三指示信息,第三指示信息用于指示在从核运行完第二VF之后是否释放第二槽位。
步骤306,从核基于第二信息获取第二函数。
从核接收第二信息之后,可以基于第二信息中的第二VF的基地址与长度获取第二VF。
可选地,从核基于第二信息中的第二VF的基地址与长度从如图1所示的存储器中获取第二VF。
本实施例中,基于第一VF的输入参数与第二VF的输入参数相同或不同,主核执行的步骤与从核执行的步骤有多种情况。若第一VF的输入参数与第二VF的输入参数相同,则本实施例还包括步骤307与步骤308。若第一VF的输入参数与第二VF的输入参数不同,则本实施例还包括步骤309与步骤310。下面分别描述:
第一种情况:第二VF的输入参数与第一VF的输入参数相同。
步骤307,主核在确定第二函数的输入参数与第一函数的输入参数相同时,控制从核基于第一函数的输入参数运行第二函数。本步骤是可选地。
可选地,在第二VF的输入参数与第一VF的输入参数相同时,第二指示信息具体用于指示在从核运行完第一VF之后保留第一槽位中存储的参数,且第二槽位标识与第一槽位标识相同。
本申请实施例中,第二VF的输入参数与第一VF的输入参数相同分为两种情况,下面分别描述:
1、第二VF的输入参数与第一VF的输入参数全部相同。
上述主核控制从核基于第一VF的输入参数运行第二VF的方式有多种情况:
1.1、第一信息中的第二指示信息,可以还用于指示第一VF的输入参数与第二VF的输入参数相同。
1.2、第一信息中的第二指示信息,可以还用于指示第二VF的输入参数重用第一VF的输入参数所在的槽位。
1.3、主核通过参数缓存区向从核传递第一指示信息,该第一指示信息包括多个比特位,多个比特位与Pong标量寄存器中的多个寄存器对应,多个比特位中的每一个比特位用于指示与每一个比特位对应的至少一个寄存器存储的参数未发生改变。Pong标量寄存器中的多个寄存器用于存储第一VF的输入参数。
可选地,第一指示信息为32比特,且第一指示信息中的每一个比特位对应Pong标量寄存器中的2个2字节的寄存器,即第一指示信息对应64个2字节的寄存器(s0-s63)或者对应32个4字节的寄存器(S0-S31)。
可选地,第一指示信息可以用0或1来表示该比特位对应Pong标量寄存器中的寄存器存储的参数不变,下面仅以0表示参数不变为例进行描述。
示例性的,第一指示信息具体为0x0000,其对应的二进制为(0000,0000,0000,0000),从右往左数,从0开始数,第0位至第15位为0,则每一比特位对应的2个s寄存器存储的参数不变。或者每一比特位对应的1个S寄存器存储的参数不变。
其中,第一VF的输入参数与第二VF的输入参数相同,可以理解为是第二VF与第一VF参数都相同,具体可以是相同位置上的参数都相同。
2、第二VF的输入参数与第一VF的输入参数部分相同。
在第二VF的输入参数与第一VF的输入参数部分相同时,第一信息中的第二指示信息具体用于指示在从核运行完第一VF之后保留第一槽位中存储的参数,便于后续重用时,可以准确获取第一函数的输入参数。另外,第二槽位标识与第一槽位标识中的一部分槽位标识相同,另一部分槽位标识不同。
主核在确定第二VF的输入参数与第一函数的输入参数部分相同时,通过参数缓存区向从核传递区别参数以及第一指示信息,并控制从核基于区别参数、第一指示信息以及第一函数的输入参数运行第二VF。该区别参数为第VF的输入参数中与第一VF的输入参数不同的参数。第一指示信息包括多个比特位,多个比特位与Pong标量寄存器中的多个寄存器对应,多个比特位中的一个比特位用于指示与一个比特位对应的至少一个寄存器存储的参数是否发生改变。
可选地,第一指示信息为32比特,且第一指示信息中的每一个比特位对应Pong标量寄存器中的2个2字节的寄存器,即第一指示信息对应64个2字节的寄存器(s0-s63)或者对应 32个4字节的寄存器(S0-S31)。
可选地,第一指示信息可以用0或1来表示该比特位对应Pong标量寄存器中的寄存器存储的参数是否发生改变,下面仅以1表示存在更改、0表示参数不变为例进行描述。
示例性的,第一指示信息具体为0x4218,其对应的二进制为(0100,0010,0001,1000),从右往左数,从0开始数,第3、4、9、14位为1,则为1的每一比特位对应Pong标量寄存器中的2个s寄存器或1个S寄存器存储的参数存在更改,以及为0的每一位比特位对应的2个s寄存器或1个S寄存器存储的参数不变。即第3位对应的s6、s7寄存器的参数存在更改,第4位对应的s8、s9寄存器的参数存在更改,第9位对应的s18、s19寄存器的参数存在更改,第14位对应的s28、s29寄存器的参数存在更改。其他为0比特位对应寄存器存储的参数不变。进一步的,区别参数用于描述上述比特位为1对应的寄存器需要改的值。例如:主核向从核传递的参数包括第一指示信息与区别参数。该参数的第一个32bit为第一指示信息。后面的bit用于表示区别参数。例如:第二个32bit为需要写入s6、s7寄存器的参数,第三个32bit为需要写入s8、s9寄存器的参数,第四个32bit为需要写入s18、s19寄存器的参数,第五个32bit为需要写入s28、s29寄存器的参数。
其中,第一VF的输入参数与第二VF的输入参数部分相同,可以理解为是第二VF与第一VF参数部分相同,具体可以是相同位置上的参数部分相同。
步骤308,从核基于第一函数的输入参数运行第二函数。本步骤是可选地。
对于上述第二VF的输入参数与第一VF的输入参数全部相同的情况,从核接收到第一信息之后,可以根据第二指示信息使用第一VF的输入参数运行第二VF。其中,第二指示信息用于指示第一VF的输入参数与第二VF的输入参数相同,或者指示第二VF的输入参数重用第一VF的输入参数所在的槽位。
第二种情况:第二VF的输入参数与第一VF的输入参数不相同。
步骤309,主核在确定第二函数的输入参数与第一函数的输入参数不相同时,传递第二函数的输入参数。本步骤是可选地。
在第二VF的输入参数与第一VF的输入参数不相同时,第一信息中的第二指示信息具体用于指示在从核运行完第一VF之后释放第一槽位,释放槽位可以理解为是该槽位中的参数可以被后续参数覆盖,或者理解为是释放第一槽位给主核在第一槽位中填写第二VF的输入参数。
主核在确定第一VF的输入参数与第二VF的输入参数不相同时,通过参数缓存区向从核传递第二VF的输入参数。
示例性的,第一VF的输入参数为1、2、3、4、5,第二VF的输入参数为5、4、3、2、1。则可以认为第二VF的输入参数与第一VF的输入参数在相同位置上没有相同的参数。
可选地,第二VF的输入参数与第一VF的输入参数不相同,可以理解为是第二VF与第一VF没有相同的参数,还可以理解为是相同位置上没有相同的参数。
步骤310,从核基于第二函数的输入参数运行第二函数。本步骤是可选地。
从核获取第二信息之后,基于第二信息中的第二槽位标识将参数缓存区中存储的第二VF的输入参数存储到从核的Pong寄存器中。在从核使用Pong寄存器中的第一VF的输入参数运行完第二VF之后,将Pong寄存器中存储的第二VF的输入参数拷贝到Ping寄存器中,并使用Ping寄存器中存储的第二VF的输入参数运行第二VF。
在从核运行第一VF时,主核可以在参数缓存区中写入第二VF的输入参数,在FIFO队列中写入第二信息。从核确定FIFO队列中新增的第二信息之后,根据第二信息中的第二槽位标识,将参数缓存区中该第二槽位标识对应的第二槽位中存储的参数拷贝到Pong标量寄存器。在从核运行完第一VF之后,将Pong标量寄存器中存储的第二VF的输入参数拷贝到Ping标量寄存器。从核基于第二信息获取第二VF后,从核可以基于Ping标量寄存器中存储的第二VF的输入参数运行第二VF。可以理解的是,在包括多个VF的情况下,从核运行多个VF的情况可以理解为是从核运行第一VF与第二VF的循环过程,此处不再赘述。
在一种可能实现的方式中,第一VF的输入参数与第二VF的输入参数相同,则本实施例可以包括步骤301至步骤308。在另一种可能实现的方式中,第一VF的输入参数与第二VF的输入参数不同,则本实施例可以包括步骤301至步骤306、步骤309与步骤310。在另一种可能实现的方式中,本实施例可以包括步骤301至步骤310。
本实施例中,一方面,主核在确定第二函数的输入参数与第一函数的输入参数相同时,控制从核基于第一函数的输入参数运行第二函数。减少传递第二函数的输入参数的开销,并减少从核运行第二函数的时间。另一方面,主核在确定第二函数的输入参数与第一函数的输入参数部分相同时,通过参数缓存区向从核传递区别参数以及第一指示信息,并控制从核基于区别参数、第一指示信息以及第一函数的输入参数运行第二函数。相对于现有技术中传输全部第二函数的输入参数,由于传输区别参数与第一指示信息所占的存储空间小于第二函数的输入参数所占的存储空间。可以减少传递第二函数的输入参数的开销,并减少从核运行第二函数的时间。另一方面,通过将存储区域分为FIFO队列与参数缓存区、Pong标量寄存器与Ping标量寄存器的方式。可以实现从核运行第一函数的同时,还可以获取第二函数的输入参数。即两个标量寄存器交替地被读和被写,提升从核运行多个函数的效率。
以上对本申请实施例所提供的通信方法及相关设备进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。

Claims (14)

  1. 一种处理器,其特征在于,包括:
    主核,用于向从核传递第一函数的输入参数;
    所述主核,还用于向所述从核发送第一信息,所述第一信息用于所述从核获取所述第一函数;
    所述从核,用于根据所述第一信息获取所述第一函数,并且根据所述第一函数的输入参数运行所述第一函数;
    所述主核,还用于向所述从核发送第二信息,所述第二信息用于所述从核获取所述第二函数;
    所述从核,还用于根据所述第二信息获取所述第二函数;
    所述主核,还用于在确定所述第二函数的输入参数与所述第一函数的输入参数相同时,控制所述从核基于所述第一函数的输入参数运行所述第二函数。
  2. 根据权利要求1所述的处理器,其特征在于,所述第二函数的输入参数与所述第一函数的输入参数相同包括:所述第二函数的输入参数与所述第一函数的输入参数存在相同的部分参数;
    所述主核,还用于向所述从核传递区别参数,并控制所述从核基于所述区别参数和所述第一函数的输入参数运行所述第二函数,所述区别参数为所述第二函数的输入参数中与所述第一函数的输入参数不同的参数。
  3. 根据权利要求2所述的处理器,其特征在于,所述从核,还用于将所述第一函数的输入参数存储在多个寄存器中;
    所述主核,还用于向所述从核传递第一指示信息,所述第一指示信息用于指示所述从核将第一寄存器中存储的参数更新为所述区别参数,所述多个寄存器包括所述第一寄存器与第二寄存器,所述第二寄存器存储有所述相同的部分参数;
    所述从核,具体用于基于所述第一指示信息使用所述区别参数更新所述第一寄存器中存储的参数,并将所述区别参数与所述相同的部分参数作为所述第二函数的输入参数运行所述第二函数。
  4. 根据权利要求3所述的处理器,其特征在于,所述第一指示信息包括多个比特位,所述多个比特位与所述多个寄存器对应,所述多个比特位中的一个比特位用于指示所述从核是否使用所述区别参数更新与所述一个比特位对应的至少一个寄存器中存储的参数。
  5. 根据权利要求1至4中任一项所述的处理器,其特征在于,所述主核,还用于在确定所述第二函数的输入参数与所述第一函数的输入参数不相同时,向所述从核传递所述第二函数的输入参数,并控制所述从核基于所述第二函数的输入参数运行所述第二函数。
  6. 根据权利要求1至5中任一项所述的处理器,其特征在于,所述处理器还包括先进先出FIFO队列;
    所述从核,具体用于通过所述FIFO队列向所述从核发送所述第一信息与所述第二信息。
  7. 根据权利要求1至6中任一项所述的处理器,其特征在于,所述处理器还包括参数缓存区;
    所述主核,具体用于通过所述参数缓存区传递所述第一函数的输入参数;
    所述从核,还用于将所述参数缓存区中存储的所述第一函数的输入参数存储到所述从核的多个寄存器中;
    所述第一信息包括所述第一函数的地址、所述第一函数的长度、第一槽位标识以及第二指示信息,所述第一槽位标识用于指示所述第一函数的输入参数存储在所述参数缓存区中的第一槽位对应的槽位标识,所述第二指示信息用于指示在所述从核运行完所述第一函数之后是否释放所述第一槽位;
    在所述第二函数的输入参数与所述第一函数的输入参数相同时,所述第二指示信息具体用于指示在所述从核运行完所述第一函数之后保留所述第一槽位中存储的参数,且所述第二槽位标识与所述第一槽位标识全部或部分相同;
    在所述第二函数的输入参数与所述第一函数的输入参数不同时,所述第二指示信息具体用于指示在所述从核运行完所述第一函数之后释放所述第一槽位;
    所述第二信息包括所述第二函数的地址、所述第二函数的长度以及第二槽位标识,所述第二槽位标识用于指示所述第二函数的输入参数存储在所述参数缓存区中的第二槽位对应的槽位标识。
  8. 一种通信方法,其特征在于,应用于处理器,所述处理器包括主核与从核,所述方法包括:
    所述主核向所述从核传递第一函数的输入参数;
    所述主核向所述从核发送第一信息,所述第一信息用于所述从核获取所述第一函数;
    所述从核根据所述第一信息获取所述第一函数,并且根据所述第一函数的输入参数运行所述第一函数;
    所述主核向所述从核发送第二信息,所述第二信息用于所述从核获取所述第二函数;
    所述从核根据所述第二信息获取所述第二函数;
    所述主核在确定所述第二函数的输入参数与所述第一函数的输入参数相同时,控制所述从核基于所述第一函数的输入参数运行所述第二函数。
  9. 根据权利要求8所述的方法,其特征在于,所述第二函数的输入参数与所述第一函数的输入参数相同包括:所述第二函数的输入参数与所述第一函数的输入参数存在相同的部分参数;
    所述方法还包括:
    所述主核向所述从核传递区别参数,并控制所述从核基于所述区别参数和所述第一函数的输入参数运行所述第二函数,所述区别参数为所述第二函数的输入参数中与所述第一函数的输入参数不同的参数。
  10. 根据权利要求9所述的方法,其特征在于,所述方法还包括:
    所述主核向所述从核传递第一指示信息,所述第一指示信息用于指示所述从核将第一寄存器中存储的参数更新为所述区别参数,多个寄存器包括所述第一寄存器与第二寄存器,所述第二寄存器存储有所述相同的部分参数,所述多个寄存器用于存储所述第一函数的输入参数;
    所述从核基于所述第一指示信息使用所述区别参数更新所述第一寄存器中存储的参数,并将所述区别参数与所述相同的部分参数作为所述第二函数的输入参数运行所述第二函数。
  11. 根据权利要求10所述的方法,其特征在于,所述第一指示信息包括多个比特位,所述 多个比特位与所述多个寄存器对应,所述多个比特位中的一个比特位用于指示所述从核是否使用所述区别参数更新与所述一个比特位对应的至少一个寄存器中存储的参数。
  12. 根据权利要求8至11中任一项所述的方法,其特征在于,所述方法还包括:
    所述主核在确定所述第二函数的输入参数与所述第一函数的输入参数不相同时,向所述从核传递所述第二函数的输入参数,并控制所述从核基于所述第二函数的输入参数运行所述第二函数。
  13. 根据权利要求8至12中任一项所述的方法,其特征在于,所述处理器还包括先进先出FIFO队列;
    所述主核向所述从核发送第一信息,包括:
    所述从核通过所述FIFO队列向所述从核发送所述第一信息;
    所述主核向所述从核发送第二信息,包括:
    所述从核通过所述FIFO队列向所述从核发送所述第二信息。
  14. 根据权利要求8至13中任一项所述的方法,其特征在于,所述处理器还包括参数缓存区;
    所述主核向所述从核传递第一函数的输入参数,包括:
    所述主核通过所述参数缓存区传递所述第一函数的输入参数;
    所述方法还包括:
    所述从核将所述参数缓存区中存储的所述第一函数的输入参数存储到所述从核的多个寄存器中;
    所述第一信息包括所述第一函数的地址、所述第一函数的长度、第一槽位标识以及第二指示信息,所述第一槽位标识用于指示所述第一函数的输入参数存储在所述参数缓存区中的第一槽位对应的槽位标识,所述第二指示信息用于指示在所述从核运行完所述第一函数之后是否释放所述第一槽位;
    在所述第二函数的输入参数与所述第一函数的输入参数相同时,所述第二指示信息具体用于指示在所述从核运行完所述第一函数之后保留所述第一槽位中存储的参数,且所述第二槽位标识与所述第一槽位标识全部或部分相同;
    在所述第二函数的输入参数与所述第一函数的输入参数不同时,所述第二指示信息具体用于指示在所述从核运行完所述第一函数之后释放所述第一槽位;
    所述第二信息包括所述第二函数的地址、所述第二函数的长度以及第二槽位标识,所述第二槽位标识用于指示所述第二函数的输入参数存储在所述参数缓存区中的第二槽位对应的槽位标识。
PCT/CN2021/109934 2021-07-31 2021-07-31 一种处理器及通信方法 WO2023010232A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
EP21952142.4A EP4379564A1 (en) 2021-07-31 2021-07-31 Processor and communication method
CN202180101086.5A CN117751356A (zh) 2021-07-31 2021-07-31 一种处理器及通信方法
PCT/CN2021/109934 WO2023010232A1 (zh) 2021-07-31 2021-07-31 一种处理器及通信方法

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/109934 WO2023010232A1 (zh) 2021-07-31 2021-07-31 一种处理器及通信方法

Publications (1)

Publication Number Publication Date
WO2023010232A1 true WO2023010232A1 (zh) 2023-02-09

Family

ID=85153996

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/109934 WO2023010232A1 (zh) 2021-07-31 2021-07-31 一种处理器及通信方法

Country Status (3)

Country Link
EP (1) EP4379564A1 (zh)
CN (1) CN117751356A (zh)
WO (1) WO2023010232A1 (zh)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2624134A1 (en) * 2012-01-31 2013-08-07 MIMOON GmbH Method and apparatus for mapping a communication system on a multicore processor
CN105183698A (zh) * 2015-09-23 2015-12-23 上海无线电设备研究所 一种基于多核dsp的控制处理系统和方法
CN107046508A (zh) * 2016-02-05 2017-08-15 华为技术有限公司 报文接收方法及网络设备
CN110704193A (zh) * 2019-10-12 2020-01-17 中国电子科技集团公司第三十八研究所 一种适合向量处理的多核软件架构的实现方法及装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2624134A1 (en) * 2012-01-31 2013-08-07 MIMOON GmbH Method and apparatus for mapping a communication system on a multicore processor
CN105183698A (zh) * 2015-09-23 2015-12-23 上海无线电设备研究所 一种基于多核dsp的控制处理系统和方法
CN107046508A (zh) * 2016-02-05 2017-08-15 华为技术有限公司 报文接收方法及网络设备
CN110704193A (zh) * 2019-10-12 2020-01-17 中国电子科技集团公司第三十八研究所 一种适合向量处理的多核软件架构的实现方法及装置

Also Published As

Publication number Publication date
CN117751356A (zh) 2024-03-22
EP4379564A1 (en) 2024-06-05

Similar Documents

Publication Publication Date Title
US7320041B2 (en) Controlling flow of data between data processing systems via a memory
WO2019042312A1 (zh) 分布式计算系统,分布式计算系统中数据传输方法和装置
JP4768386B2 (ja) 外部デバイスとデータ通信可能なインターフェイスデバイスを有するシステム及び装置
CN105518611B (zh) 一种远程直接数据存取方法、设备和系统
US7409468B2 (en) Controlling flow of data between data processing systems via a memory
US6925512B2 (en) Communication between two embedded processors
WO2023279993A1 (zh) 图形渲染方法、装置、电子设备与存储介质
CN104820582A (zh) 一种基于Navigator的多核嵌入式DSP并行编程模型实现方法
JP2006338538A (ja) ストリームプロセッサ
US20150154004A1 (en) System and method for supporting efficient buffer usage with a single external memory interface
WO2021147045A1 (zh) 一种基于PCIe的数据传输方法及装置
US10489322B2 (en) Apparatus and method to improve performance in DMA transfer of data
CN113296979B (zh) 一种虚幻引擎与外部程序的数据通信方法
CN116775522A (zh) 一种基于网络设备的数据处理方法及网络设备
WO2023010232A1 (zh) 一种处理器及通信方法
US10733689B2 (en) Data processing
CN109032818B (zh) 一种同构系统核间同步与通信的方法
CN112506676A (zh) 进程间的数据传输方法、计算机设备和存储介质
CN115543882B (zh) 不同位宽总线间的数据转发装置及数据传输方法
CN112395246A (zh) 用于实现多个推理计算引擎的方法和装置
JP2008502977A (ja) バス・コントローラのための割り込み方式
CN105117353A (zh) 带有通用数据交互模块的fpga及采用该fpga的信息处理系统
CN114371920A (zh) 一种基于图形处理器加速优化的网络功能虚拟化系统
WO2013086847A1 (zh) 核间通信的方法及核处理器
US11386029B2 (en) Direct memory access controller

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21952142

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 202180101086.5

Country of ref document: CN

WWE Wipo information: entry into national phase

Ref document number: 2021952142

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2021952142

Country of ref document: EP

Effective date: 20240229