WO2022001317A1 - 芯片仿真方法、装置、设备、系统及存储介质 - Google Patents

芯片仿真方法、装置、设备、系统及存储介质 Download PDF

Info

Publication number
WO2022001317A1
WO2022001317A1 PCT/CN2021/089017 CN2021089017W WO2022001317A1 WO 2022001317 A1 WO2022001317 A1 WO 2022001317A1 CN 2021089017 W CN2021089017 W CN 2021089017W WO 2022001317 A1 WO2022001317 A1 WO 2022001317A1
Authority
WO
WIPO (PCT)
Prior art keywords
vector
data
processed
data processing
execution module
Prior art date
Application number
PCT/CN2021/089017
Other languages
English (en)
French (fr)
Inventor
徐帮元
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Priority to EP21832822.7A priority Critical patent/EP4170538A4/en
Publication of WO2022001317A1 publication Critical patent/WO2022001317A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/907Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/80Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
    • G06F15/8053Vector processors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/30Circuit design
    • G06F30/32Circuit design at the digital level
    • G06F30/33Design verification, e.g. functional simulation or model checking
    • G06F30/3308Design verification, e.g. functional simulation or model checking using simulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/30Circuit design
    • G06F30/32Circuit design at the digital level
    • G06F30/33Design verification, e.g. functional simulation or model checking
    • G06F30/3308Design verification, e.g. functional simulation or model checking using simulation
    • G06F30/331Design verification, e.g. functional simulation or model checking using simulation with hardware acceleration, e.g. by using field programmable gate array [FPGA] or emulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2115/00Details relating to the type of the circuit
    • G06F2115/02System on chip [SoC] design

Definitions

  • the embodiments of the present invention relate to the technical field of chip emulation, and specifically disclose a chip emulation method, apparatus, device, system and storage medium.
  • the CPU chip is simulated by building a corresponding CPU chip simulation environment.
  • the CPU chip simulation environment there are two main ways to build a CPU chip simulation environment: one is to directly use the hardware description language to implement the CPU chip simulation algorithm, and integrate it into an application-specific integrated circuit to build the simulation environment; the other is to use a high-level language to The CPU chip is abstracted and an object-oriented CPU chip simulation system is established.
  • Embodiments of the present invention provide a chip emulation method, apparatus, device, system, and storage medium.
  • an embodiment of the present invention provides a chip emulation method, which includes: sending vector-type pending data corresponding to chip emulation to a preset vector execution module; obtaining the vector execution module to perform data processing on the vector-type pending data to generate a simulation result corresponding to the chip simulation according to the first data processing result.
  • an embodiment of the present invention provides a computer device, the computer device includes a memory and a processor; the memory is configured to store a computer program; the processor is configured to execute the computer program and implement the above when executing the computer program. Chip emulation method.
  • an embodiment of the present invention provides a chip emulation method, including: acquiring vector type pending data corresponding to chip emulation sent by a scalar execution module; performing data processing on the vector type pending data to obtain a first data processing result , so that the scalar execution module generates the simulation result corresponding to the chip simulation according to the first data processing result.
  • an embodiment of the present invention provides a chip emulation device, the chip emulation device includes a memory and a processor; the memory is configured to store a computer program; the processor is configured to execute the computer program and implement the following when executing the computer program.
  • the above-mentioned chip simulation method includes a processor and a processor; the memory is configured to store a computer program; the processor is configured to execute the computer program and implement the following when executing the computer program.
  • an embodiment of the present invention further provides a chip emulation system, the chip emulation system includes the above-mentioned computer equipment and the above-mentioned chip emulation apparatus, and the computer apparatus is connected in communication with the chip emulation apparatus.
  • an embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and when the computer program is executed by the processor, the processor implements the above-mentioned chip emulation method.
  • FIG. 1 is a schematic block diagram of a chip simulation system provided by an embodiment of the present invention.
  • FIG. 2 is a schematic block diagram of a computer device according to an embodiment of the present invention.
  • FIG. 3 is a schematic block diagram of a chip emulation device provided by an embodiment of the present invention.
  • FIG. 4 is a schematic flowchart of steps of a chip simulation method provided by an embodiment of the present invention.
  • FIG. 5 is a schematic flowchart of steps of another chip simulation method provided by an embodiment of the present invention.
  • FIG. 6 is a schematic flowchart of a CPU chip simulation provided by an embodiment of the present invention.
  • Embodiments of the present invention provide a chip emulation method, apparatus, device, system, and storage medium, so as to improve the efficiency of chip emulation.
  • the chip includes but is not limited to a CPU chip, for example, other integrated chips may also be included.
  • the CPU chip is taken as an example to explain each embodiment of the present invention.
  • FIG. 1 is a schematic block diagram of a chip emulation system provided by an embodiment of the present invention.
  • the chip emulation system 100 includes a computer device 10 and a chip emulation apparatus 20 . Wherein, a wired or wireless communication connection is established between the computer device 10 and the chip emulation apparatus 20 .
  • the computer device 10 includes a scalar execution module 11 , wherein the scalar execution module 11 includes a soft emulation unit 111 ; the chip simulation apparatus 20 includes a vector execution module 21 , wherein the vector execution module 21 includes a vector kernel unit 211 .
  • the scalar execution module 11 is designed and implemented by a high-level language, and the vector execution module 21 is designed and implemented on a semi-customized hardware system to accelerate the operation of vector operations.
  • the vector execution module 21 includes, but is not limited to, a Field-Programmable Gate Array (FPGA), a graphics processor (Graphics Processing Unit, GPU) and other hardware systems that can provide vector operation optimization.
  • FPGA Field-Programmable Gate Array
  • GPU Graphics Processing Unit
  • the vector core unit 211 is a hardware execution system optimized for vector operations. It can accelerate vector operations in the CPU simulation according to the scalar vector deployment of the CPU chip simulation, while ensuring the flexible construction of the scalar execution module 11 .
  • the scalar execution module 11 sends the configuration information corresponding to the CPU chip emulation to the vector execution module 21. After receiving the configuration information, the vector execution module 21 completes the configuration of the vector core unit 211 according to the configuration information, and after the configuration of the vector core unit 211 is completed , and send the corresponding configuration completion information to the scalar execution module 11. After the scalar execution module 11 receives the configuration completion information sent by the vector execution module 21, the soft imitation unit 111 starts to emulate the CPU chip system, and continuously sends to the vector execution module 21.
  • the vector core unit 211 continuously obtains the vector type data to be processed and the corresponding data parameters for data processing and feeds back the results to the scalar execution module 11 ;
  • the scalar execution module 11 further includes a scheduling unit 112 and a message transmission unit 113
  • the vector execution module 21 further includes a shared storage unit 212 and a cache identifier unit 213 .
  • the scheduling unit 112 determines the construction mode of the vector core unit 211 according to the deployment situation, and sends the corresponding script file to the vector execution module 21, and the vector execution module 21 constructs the vector core unit 211 according to the script file.
  • the message transmission unit 113 continuously sends the vector type data to be processed and the corresponding data parameters to the vector execution module 21, and the vector core unit 211 continuously obtains the vector type to be processed data and the corresponding data parameters for processing and returns the result.
  • the message transmission unit 113 can optimize the scalar and vector division granularity, that is, optimize and configure the division granularity of the vector type data to be processed and the scalar type data to be processed, so as to give full play to the performance of scalar-vector separation and improve the simulation efficiency.
  • the shared storage unit 212 is connected to the scalar execution module 11 through a physical connection, for example, connected to the scalar execution module 11 through a corresponding physical interface, and the shared storage unit 212 stores vector type data to be processed.
  • the cache identifier unit 213 is set to store the physical address and data parameters corresponding to the data to be processed in the vector type. Vector type data to be processed and data processing.
  • the scalar execution module 11 and the vector execution module 21 transmit data through a shared memory method mapped by physical addresses, which reduces the number of data transfers and further improves the efficiency of CPU chip emulation.
  • the simulation operations of the vector kernel unit 211 and the soft imitation unit 111 may be performed in parallel, that is, while the soft imitation unit 111 performs data processing on the scalar type to be processed data, the vector kernel unit 211 performs data processing on the vector type to be processed. Processing data for data processing, thereby improving the operating efficiency of the CPU chip simulation system.
  • FIG. 2 is a schematic block diagram of a computer device according to an embodiment of the present invention.
  • the computer device 200 includes a processor 201 and a memory 202, wherein the processor 201 and the memory 202 are connected by a bus.
  • the memory 202 may include a non-volatile storage medium and an internal memory.
  • the nonvolatile storage medium can store operating systems and computer programs.
  • the computer program includes program instructions that, when executed, can cause the processor to execute any chip emulation method.
  • the processor 201 is configured to provide computing and control capabilities to support the operation of the entire terminal device.
  • the internal memory provides an environment for running the computer program in the non-volatile storage medium.
  • the processor can execute any chip emulation method.
  • FIG. 2 is only a block diagram of a part of the structure related to the embodiments of the present invention, and does not constitute a limitation on the computer equipment to which the embodiments of the present invention are applied.
  • the specific computer equipment may be Include more or fewer components than shown in the figures, or combine certain components, or have a different arrangement of components.
  • the processor 201 may be a central processing unit (Central Processing Unit, CPU), and the processor may also be other general-purpose processors, digital signal processors (Digital Signal Processors, DSP), application specific integrated circuits (Application Specific integrated circuits) Integrated Circuit, ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • the general-purpose processor can be a microprocessor or the processor can also be any conventional processor or the like.
  • the processor is configured to run a computer program stored in the memory to implement the following steps:
  • the processor before the processor sends the vector type data to be processed corresponding to the chip emulation to the preset vector execution module, the processor is further configured to: send the configuration information corresponding to the chip emulation to the vector execution module for The vector execution module configures the vector core unit according to the configuration information, and performs data processing on the vector type data to be processed through the vector core unit.
  • the vector execution module includes a shared storage unit, and the processor is configured to implement: when implementing sending the vector type data to be processed corresponding to the chip emulation to the preset vector execution module:
  • the processor When implementing the acquisition of the first data processing result of the vector execution module performing data processing on the vector type data to be processed, the processor is set to implement:
  • the vector execution module further includes a cache identifier unit, and the processor is further configured to implement:
  • Data is processed by the way data is processed.
  • the processor is further configured to implement: perform data processing on the scalar data to be processed corresponding to the chip emulation to obtain a second data processing result;
  • the processor After obtaining the first data processing result of the vector execution module performing data processing on the vector type data to be processed, the processor is further set to achieve:
  • a simulation result is generated according to the first data processing result and the second data processing result.
  • the processor is further configured to: configure the granularity of division of the vector type pending data and the scalar type pending data.
  • FIG. 3 is a schematic block diagram of a chip emulation apparatus provided by an embodiment of the present invention.
  • the chip emulation apparatus 300 includes a processor 301 and a memory 302, wherein the processor 3301 and the memory 302 are connected through a bus.
  • the memory 302 may include a non-volatile storage medium and an internal memory.
  • the nonvolatile storage medium can store operating systems and computer programs.
  • the computer program includes program instructions that, when executed, can cause the processor to execute any chip emulation method.
  • the processor 301 is configured to provide computing and control capabilities to support the operation of the entire terminal device.
  • the internal memory provides an environment for running the computer program in the non-volatile storage medium.
  • the processor can execute any chip emulation method.
  • FIG. 3 is only a block diagram of a partial structure related to the solution of the embodiment of the present invention, and does not constitute a limitation on the chip emulation device to which the solution of the embodiment of the present invention is applied.
  • An apparatus may include more or fewer components than those shown in the figures, or combine certain components, or have a different arrangement of components.
  • the processor 301 may be a central processing unit (Central Processing Unit, CPU), and the processor may also be other general-purpose processors, digital signal processors (Digital Signal Processors, DSP), application specific integrated circuits (Application Specific integrated circuits) Integrated Circuit, ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • the general-purpose processor can be a microprocessor or the processor can also be any conventional processor or the like.
  • the processor is configured to run a computer program stored in the memory to implement the following steps:
  • Data processing is performed on the vector type data to be processed to obtain a first data processing result, which is used by the scalar execution module to generate a simulation result corresponding to the chip simulation according to the first data processing result.
  • the processor is set to realize when obtaining the vector type data to be processed corresponding to the chip emulation sent by the scalar execution module:
  • the data index is performed in the shared storage unit according to the physical address to obtain the vector-type data to be processed, wherein the scalar execution module sends the vector-type to-be-processed data to the shared storage unit.
  • the processor when the processor implements data processing on the vector-type data to be processed, it is configured to implement:
  • the processor is further configured to implement:
  • the first data processing result is stored in the shared storage unit, and the physical address corresponding to the first data processing result is stored in the cache identifier unit.
  • chip emulation method provided by the embodiments of the present invention will be described in detail below with reference to the chip emulation system in FIG. 1 , the computer device in FIG. 2 , and the chip emulation device in FIG. 3 . It should be noted that the above-mentioned chip emulation system, computer equipment and chip emulation apparatus do not constitute limitations on the application scenarios of the chip emulation method provided by the embodiments of the present invention.
  • FIG. 4 is a schematic flowchart of steps of a chip simulation method provided by an embodiment of the present invention. The method can be used in the above computer equipment to improve the efficiency of chip simulation.
  • the method includes steps S101 to S102.
  • scalar and vector operations correspond to the corresponding scalar-type pending data and vector-type pending data.
  • the scalar-type pending data and vector-type pending data are distinguished.
  • the vector execution module is designed and implemented on a semi-custom hardware system, which can accelerate the operation of vector operations.
  • the vector execution module includes, but is not limited to, hardware systems such as FPGA and GPU, which can provide vector operation functions.
  • the vector execution module After the vector execution module receives and obtains the vector-type to-be-processed data, it performs corresponding data processing on the vector-type to-be-processed data.
  • the data processing includes operations such as data time domain/frequency domain conversion, data encoding and decoding, and the like.
  • step S101 it may further include: sending configuration information corresponding to the chip emulation to the vector execution module, so that the vector execution module configures the vector core unit according to the configuration information, and the vector type data to be processed is processed by the vector core unit. perform data processing.
  • the vector core unit is the core unit of the vector execution module.
  • the hardware execution system optimized for vector operations is set to process the vector data to be processed. For data processing, vector operations in CPU simulation can be accelerated.
  • the configuration information corresponding to the CPU chip emulation is sent to the vector execution module, and after receiving the configuration information, the vector execution module configures the vector core unit of the vector execution module according to the configuration information. For example, taking the configuration of a digital signal processing (Digital Signal Processing, DSP) core execution unit as an example, by sending the DSP core configuration information for communication application chip simulation to the vector execution module, the vector execution module builds the DSP core execution unit according to the configuration information. Set up to handle large-scale vector operations in mobile communications.
  • DSP Digital Signal Processing
  • the construction method of the vector core unit in the vector execution module is determined according to the deployment situation, and a corresponding script file is sent to the vector execution module. After receiving the script file, the vector execution module configures the vector core unit according to the script file.
  • the vector execution module feeds back the corresponding configuration completion information.
  • the CPU chip simulation starts, and the corresponding vector-type pending data is sent to the vector execution module.
  • the data processing result obtained by performing data processing on the vector type data to be processed is referred to as the first data processing result.
  • S102 Obtain a first data processing result of data processing performed by the vector execution module on the vector type data to be processed, so as to generate a simulation result corresponding to chip simulation according to the first data processing result.
  • the vector kernel unit performs corresponding data processing on the vector type data to be processed, and after obtaining the first data processing result, the first data processing result is obtained. For example, after performing data processing to obtain the first data processing result, the vector execution module returns the first data processing result, and directly receives the obtained first data processing returned by the vector execution module.
  • the vector execution module saves the first data processing result to the corresponding storage device, so the first data processing result can be obtained by querying the storage device.
  • the first data processing result is used as intermediate information obtained by performing the CPU chip simulation. Based on the first data processing result, it is determined whether the current CPU chip simulation is completed, and a corresponding simulation result is generated.
  • the vector type data to be processed is sent to the vector execution module, data processing is performed on the corresponding scalar type to be processed data to obtain a data processing result corresponding to the scalar type to be processed data.
  • the data processing result obtained by performing data processing on the scalar type data to be processed is referred to as the second data processing result.
  • the vector execution module while performing data processing on the vector-type data to be processed by the vector execution module, it performs data processing on the scalar-type to-be-processed data by itself, and obtains the first data processing result corresponding to the vector-type to-be-processed data, and the scalar-type to-be-processed data.
  • the second data processing result corresponding to the processing data is processed.
  • the obtained first data processing result and the second data processing result comprehensively analyze the first data processing result and the second data processing result, and generate a simulation result corresponding to the CPU chip simulation.
  • the vector execution module further includes a shared storage unit configured to store data to be processed and data processing results corresponding to the data to be processed.
  • Sending the vector-type pending data corresponding to the chip emulation to the preset vector execution module may include: sending the vector-type pending data to the shared storage unit, so that the vector core unit obtains the vector-type pending data from the shared storage unit;
  • Acquiring the first data processing result of data processing performed by the vector execution module on the vector type data to be processed may include: acquiring the first data processing result stored in the shared storage unit.
  • the vector-type pending data is stored in the shared storage unit, and the vector core unit can directly obtain the stored vector-type pending data from the shared storage unit.
  • the vector type data to be processed is processed, the corresponding first data processing result is obtained, and the first data processing result is saved to the shared storage unit. After that, the first data processing result corresponding to the vector type data to be processed can be obtained by accessing the shared storage unit.
  • the vector execution module further includes a cache identifier unit, and the cache identifier unit can be configured to store data parameters corresponding to the vector-type data to be processed, wherein the data parameters include but are not limited to data size, data attributes, data type of operation, etc.
  • the data parameter corresponding to the vector-type to-be-processed data is sent to the cache identifier unit.
  • the vector core unit obtains the stored vector type data to be processed from the shared storage unit, and obtains the data parameters corresponding to the vector type data to be processed from the cache identifier unit, and determines the vector type according to the data parameters corresponding to the vector type to be processed data.
  • the processing mode of the data to be processed and then perform the data processing of the processing mode on the vector type data to be processed.
  • the physical address corresponding to the vector type data to be processed stored in the shared storage unit is stored in the cache identifier unit.
  • the vector core unit can perform data indexing in the shared storage unit according to the physical address, obtain the corresponding vector-type pending data, and then perform data indexing on the vector-type pending data.
  • Process data for data processing That is, data transfer is carried out through the shared memory method of physical address mapping, which reduces the number of data transfers, thereby improving the running speed of the simulation.
  • the data parameters corresponding to the vector-type data to be processed and the vector-type to-be-processed data may also be sent to the shared storage unit together, and the vector-type to-be-processed data and the vector-type to-be-processed data are associated and stored in the shared storage unit.
  • the vector core unit can obtain the associated stored vector-type data to be processed and data parameters corresponding to the vector-type to-be-processed data from the shared storage unit, and determine the processing of the vector-type to-be-processed data according to the data parameters corresponding to the vector-type to-be-processed data method, and then perform the data processing of this processing method on the vector type data to be processed.
  • the division granularity of the vector-type to-be-processed data and the scalar-type data to be processed is configured at timing or when a preset condition is currently reached, that is, optimal setting of the scalar and vector division granularity. For example, if each cache identifier unit call consumes the same time, the scalar and vector processes are integrated, the scalar and vector division granularity is optimized, and the number of calls to the cache identifier unit is reduced.
  • FIG. 5 is a schematic flowchart of a chip simulation method provided by another embodiment of the present invention.
  • the chip emulation method can be applied to a chip emulation apparatus, and includes step S201 and step S202.
  • the scalar execution module configured in the computer equipment is used to distinguish the scalar data to be processed and the vector data to be processed, and the determined vector data to be processed is sent to the chip Simulation device.
  • the scalar execution module is designed and implemented by high-level language.
  • a vector execution module is preset in the chip emulation device, wherein the vector execution module is designed and implemented on a semi-custom hardware system, which can speed up the operation of vector operations.
  • the vector execution module includes, but is not limited to, hardware systems such as FPGA and GPU, which can provide vector operation functions.
  • the chip emulation device receives, through the vector execution module, the vector type to-be-processed data sent by the scalar execution module.
  • the vector execution module includes a vector core unit, and the vector core unit receives the acquired vector-type data to be processed and sent by the scalar execution module.
  • acquiring the vector-type pending data corresponding to the chip emulation sent by the scalar execution module may include: acquiring a physical address corresponding to the vector-type pending data stored in the cache identifier unit; Perform data indexing to obtain vector-type data to be processed, wherein the scalar execution module sends the vector-type to-be-processed data to the shared storage unit.
  • the vector execution module further includes a shared storage unit and a cache identifier unit, the shared storage unit stores vector-type data to be processed, and the cache identifier unit stores a physical address corresponding to the vector-type to-be-processed data.
  • the corresponding vector type data to be processed is obtained by acquiring the physical address corresponding to the vector type pending data stored in the cache identifier unit, and then performing data indexing in the shared storage unit based on the physical address corresponding to the vector type pending data. That is, data transfer is carried out through the shared memory method of physical address mapping, which reduces the number of data transfers, thereby improving the running speed of the simulation.
  • the data processing result corresponding to the vector type data to be processed is referred to as the first data processing result below.
  • performing data processing on the vector-type data to be processed may include: acquiring data parameters corresponding to the vector-type data to be processed stored in the cache identifier unit, wherein the scalar execution module sends the data parameters to the cache identifier unit ; Determine the processing mode of the vector type data to be processed according to the data parameters, and perform the data processing of the processing mode on the vector type to be processed data.
  • the data parameters corresponding to the vector type data to be processed include, but are not limited to, data size, data attributes, data operation types, and the like.
  • the cache identifier unit can also be configured to store data parameters corresponding to the vector type data to be processed.
  • the scalar execution module sends the data parameter corresponding to the vector-type to-be-processed data to the cache identifier unit while sending the vector-type to-be-processed data to the shared storage unit.
  • performing data processing on the vector type data to be processed and obtaining the first data processing result may include: saving the first data processing result to a shared storage unit, and saving the physical address corresponding to the first data processing result to the cache identifier unit.
  • the first data processing result is not directly returned to the scalar execution module, but the first data processing result is stored in the shared storage unit, and The first data processing result is stored in the physical address corresponding to the shared storage unit, and stored in the cache identifier unit.
  • the scalar execution module obtains the first data processing result by obtaining the physical address corresponding to the first data processing result stored in the cache identifier unit, and performing data indexing in the shared storage unit.
  • the scalar execution module generates a simulation result corresponding to the CPU chip simulation according to the first data processing result.
  • the scalar execution module performs data processing on the corresponding scalar data to be processed to obtain a data processing result corresponding to the scalar data to be processed.
  • the data processing result obtained by performing data processing on the scalar type data to be processed is referred to as the second data processing result.
  • the scalar execution module comprehensively analyzes the first data processing result and the second data processing result according to the obtained first data processing result and the second data processing result, and generates a simulation result corresponding to the CPU chip simulation.
  • Step1 The scalar execution module sends configuration information to the vector execution module
  • Step2 The vector execution module completes the configuration of the vector core unit according to the configuration information
  • Step3 The scalar execution module continuously updates and sends the vector type data to be processed and the corresponding data parameters;
  • Step4 The vector core unit continuously obtains the vector type data to be processed and the corresponding data parameters
  • Step5 The vector core unit performs data processing on the vector type data to be processed, and feeds back the results to the scalar execution module;
  • Step6 The scalar execution module determines whether the simulation is completed according to the feedback result; if so, execute Step7; otherwise, return to execute Step3;
  • Step7 Output the simulation results.
  • the vector-type to-be-processed data corresponding to the chip emulation is sent to the preset vector execution module, and the vector-type to-be-processed data is processed by the vector execution module to obtain the corresponding first data processing result. From the data processing result, the simulation result corresponding to the chip simulation is generated, that is, the vector operation in the chip simulation is separated for independent processing, thereby improving the efficiency of the chip simulation.
  • Embodiments of the present invention also provide a computer-readable storage medium, where the computer-readable storage medium stores a computer program, the computer program includes program instructions, and the processor executes the program instructions to implement any one of the chips provided in the embodiments of the present invention Simulation method.
  • the computer program is loaded by the processor and can perform the following steps:
  • the embodiment of the present invention discloses a chip simulation method, device, equipment and storage medium.
  • the vector type to-be-processed data is processed by the vector type execution module.
  • Data processing obtaining the corresponding first data processing result, and generating the simulation result corresponding to the chip simulation based on the first data processing result, that is, separating the vector operations in the chip simulation for independent processing, thus improving the efficiency of the chip simulation .
  • the computer-readable storage medium may be an internal storage unit of the chip emulation system of the foregoing embodiments, such as a hard disk or a memory of the chip emulation system.
  • the computer-readable storage medium may also be an external storage device of the chip emulation system, such as a plug-in hard disk, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) card, and a flash memory equipped on the chip emulation system. Card (Flash Card), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • Geometry (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Library & Information Science (AREA)
  • Memory System Of A Hierarchy Structure (AREA)
  • Advance Control (AREA)

Abstract

一种芯片仿真方法、装置、设备、系统及存储介质,属于芯片仿真技术领域。该方法包括:将芯片仿真对应的矢量型待处理数据发送至预置的矢量执行模块(S101);获取矢量执行模块对矢量型待处理数据进行数据处理的第一数据处理结果,以根据第一数据处理结果生成芯片仿真对应的仿真结果(S102)。

Description

芯片仿真方法、装置、设备、系统及存储介质
相关申请的交叉引用
本申请基于申请号为202010599906.5、申请日为2020年6月28日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。
技术领域
本发明实施例涉及芯片仿真技术领域,具体公开了一种芯片仿真方法、装置、设备、系统及存储介质。
背景技术
随着科技的发展,对芯片(如CPU芯片)的处理能力要求越来越高,这使得对芯片进行仿真变得非常重要。以CPU芯片为例,通过搭建相应的CPU芯片仿真环境对CPU芯片进行仿真。目前,CPU芯片仿真环境的搭建主要有两种方法:一种是直接使用硬件描述语言对CPU芯片仿真算法进行实现,综合到专用集成电路中进行仿真环境的搭建;另一种是使用高级语言对CPU芯片进行抽象,建立面向对象的CPU芯片仿真系统。
在进行CPU芯片仿真中,需要执行相关标量和矢量运算,这会影响CPU芯片仿真的运行速度,尤其是矢量运算会严重降低CPU芯片仿真的运行速度,从而导致CPU芯片仿真的效率不高。
发明内容
本发明实施例提供了一种芯片仿真方法、装置、设备、系统及存储介质。
第一方面,本发明实施例提供了一种芯片仿真方法,包括:将芯片仿真对应的矢量型待处理数据发送至预置的矢量执行模块;获取矢量执行模块对矢量型待处理数据进行数据处理的第一数据处理结果,以根据第一数据处理结果生成芯片仿真对应的仿真结果。
第二方面,本发明实施例提供了一种计算机设备,计算机设备包括存储器和处理器;存储器被设置为存储计算机程序;处理器,被设置为执行计算机程序并在执行计算机程序时实现如上述的芯片仿真方法。
第三方面,本发明实施例提供了一种芯片仿真方法,包括:获取标量执行模块发送的芯片仿真对应的矢量型待处理数据;对矢量型待处理数据进行数据处理,获得第一数据处理结果,以供标量执行模块根据第一数据处理结果生成芯片仿真对应的仿真结果。
第四方面,本发明实施例提供了一种芯片仿真装置,芯片仿真装置包括存储器和处理器;存储器被设置为存储计算机程序;处理器,被设置为执行计算机程序并在执行计算机程序时实现如上述的芯片仿真方法。
第五方面,本发明实施例还提供了一种芯片仿真系统,芯片仿真系统包括如上述的计 算机设备和如上述的芯片仿真装置,计算机设备与芯片仿真装置通信连接。
第六方面,本发明实施例还提供了一种计算机可读存储介质,计算机可读存储介质存储有计算机程序,计算机程序被处理器执行时使处理器实现如上述的芯片仿真方法。
附图说明
图1是本发明实施例提供的一种芯片仿真系统的示意性框图;
图2是本发明实施例提供的一种计算机设备的示意性框图;
图3是本发明实施例提供的一种芯片仿真装置的示意性框图;
图4是本发明实施例提供的一种芯片仿真方法的步骤示意流程图;
图5是本发明实施例提供的另一种芯片仿真方法的步骤示意流程图;
图6是本发明实施例提供的CPU芯片仿真的流程示意图。
具体实施方式
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
附图中所示的流程图仅是示例说明,不是必须包括所有的内容和操作/步骤,也不是必须按所描述的顺序执行。例如,有的操作/步骤还可以分解、组合或部分合并,因此实际执行的顺序有可能根据实际情况改变。
应当理解,在此本发明说明书中所使用的术语仅仅是出于描述特定实施例的目的而并不意在限制本发明。如在本发明说明书和所附权利要求书中所使用的那样,除非上下文清楚地指明其它情况,否则单数形式的“一”、“一个”及“该”意在包括复数形式。
还应当理解,在本发明说明书和所附权利要求书中使用的术语“和/或”是指相关联列出的项中的一个或多个的任何组合以及所有可能组合,并且包括这些组合。
应当理解,此处所描述的具体实施例仅仅用以解释本发明,并不用于限定本发明。
在后续的描述中,使用用于表示元件的诸如“模块”、“部件”或“单元”的后缀仅为了有利于本发明实施例的说明,其本身没有特有的意义。因此,“模块”、“部件”或“单元”可以混合地使用。
本发明的实施例提供了一种芯片仿真方法、装置、设备、系统及存储介质,以实现提高芯片仿真的效率。其中,芯片包括但不限于CPU芯片,例如还可以包括其他的集成芯片,本发明中以CPU芯片为例对本发明的各实施例进行解释说明。
请参阅图1,图1是本发明实施例提供的一种芯片仿真系统的示意性框图。该芯片仿真系统100包括计算机设备10、以及芯片仿真装置20。其中,计算机设备10和芯片仿真装置20之间建立有线或无线通信连接。
计算机设备10,包括标量执行模块11,其中,标量执行模块11包括软仿单元111; 芯片仿真装置20,包括矢量执行模块21,其中,矢量执行模块21包括矢量核单元211。标量执行模块11通过高级语言设计实现,矢量执行模块21通过在半定制化硬件系统上设计实现,加速矢量运算的运行。
在一些示例中,矢量执行模块21包括但不限于现场可编程门阵列(Field-Programmable Gate Array,FPGA)、图形处理器(Graphics Processing Unit,GPU)等可以提供矢量运算优化的硬件系统。
矢量核单元211是专为矢量运算进行优化的硬件执行系统,可以根据CPU芯片仿真的标矢量部署情况对CPU仿真中的矢量运算进行加速,同时可以保证标量执行模块11的灵活构建。
标量执行模块11向矢量执行模块21发送CPU芯片仿真对应的配置信息,矢量执行模块21接收到该配置信息后,根据该配置信息完成矢量核单元211的配置,并在矢量核单元211配置完成后,发送相应的配置完成信息至标量执行模块11,标量执行模块11在接收到矢量执行模块21发送的配置完成信息之后,软仿单元111开始进行CPU芯片系统仿真,不断地往矢量执行模块21发送矢量型待处理数据、以及矢量型待处理数据对应的数据参数,其中,数据参数包括但不限于数据大小、数据属性、数据操作类型等等。矢量核单元211不断地获取矢量型待处理数据和对应的数据参数进行数据处理并反馈结果至标量执行模块11;标量执行模块11根据反馈的结果决定是否结束仿真给出仿真结果。
在一些示例中,标量执行模块11还包括调度单元112以及消息传输单元113,矢量执行模块21还包括共享存储单元212以及缓存标识符单元213。
在确定CPU芯片仿真的标矢量部署之后,调度单元112根据部署情况确定矢量核单元211的构建方式,并发送相应脚本文件至矢量执行模块21,矢量执行模块21根据脚本文件构建矢量核单元211。
消息传输单元113不断地给矢量执行模块21发送矢量型待处理数据和对应的数据参数,矢量核单元211不断地获取矢量型待处理数据和对应的数据参数进行处理并返回结果。进一步地,
消息传输单元113可优化标量和矢量划分粒度,也即对矢量型待处理数据和标量型待处理数据的划分粒度进行优化配置,从而充分发挥标矢量分离的性能,提升仿真效率。
共享存储单元212通过物理连接的方式与标量执行模块11进行连接,例如通过相应的物理接口与标量执行模块11连接,在共享存储单元212中存储矢量型待处理数据。
缓存标识符单元213被设置为存储矢量型待处理数据对应的物理地址以及数据参数,矢量核单元211根据缓存标识符单元213中缓存的物理地址在共享存储单元212中进行数据索引,获取对应的矢量型待处理数据并进行数据处理。
标量执行模块11和矢量执行模块21通过物理地址映射的共享内存方式进数据的传输,降低数据搬移次数,进一步可以提升CPU芯片仿真的效率。
在一些示例中,矢量核单元211和软仿单元111的仿真运算可以并行进行,也即一边通过软仿单元111对标量型待处理数据进行数据处理,另一边通过矢量核单元211对矢量型待处理数据进行数据处理,从而提升CPU芯片仿真系统的运行效率。
请参阅图2,图2是本发明实施例提供的一种计算机设备的示意性框图。该计算机设备200包括处理器201和存储器202,其中,处理器201和存储器202通过总线连接。
其中,存储器202可以包括非易失性存储介质和内存储器。
非易失性存储介质可存储操作系统和计算机程序。该计算机程序包括程序指令,该程序指令被执行时,可使得处理器执行任意一种芯片仿真方法。
处理器201被设置为提供计算和控制能力,支撑整个终端设备的运行。
内存储器为非易失性存储介质中的计算机程序的运行提供环境,该计算机程序被处理器执行时,可使得处理器执行任意一种芯片仿真方法。
可以理解,图2中示出的结构,仅仅是与本发明实施例方案相关的部分结构的框图,并不构成对本发明实施例方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
应当理解的是,处理器201可以是中央处理单元(Central Processing Unit,CPU),该处理器还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。其中,通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
其中,在一些实施例中,处理器被设置为运行存储在存储器中的计算机程序,以实现如下步骤:
将芯片仿真对应的矢量型待处理数据发送至预置的矢量执行模块;
获取矢量执行模块对矢量型待处理数据进行数据处理的第一数据处理结果,以根据第一数据处理结果生成芯片仿真对应的仿真结果。
在一些实施例中,处理器在实现将芯片仿真对应的矢量型待处理数据发送至预置的矢量执行模块之前,还用于实现:将芯片仿真对应的配置信息发送至矢量执行模块,以供矢量执行模块根据配置信息配置矢量核单元,通过矢量核单元对矢量型待处理数据进行数据处理。
在一些实施例中,矢量执行模块包括共享存储单元,处理器在实现将芯片仿真对应的矢量型待处理数据发送至预置的矢量执行模块时,被设置为实现:
将矢量型待处理数据发送至共享存储单元,以供矢量核单元从共享存储单元中获取矢量型待处理数据;
处理器在实现获取矢量执行模块对矢量型待处理数据进行数据处理的第一数据处理结果时,被设置为实现:
获取共享存储单元中保存的第一数据处理结果。
在一些实施例中,矢量执行模块还包括缓存标识符单元,处理器还被设置为实现:
将矢量型待处理数据对应的数据参数发送至缓存标识符单元,以供矢量核单元从缓存标识符单元获取数据参数,并根据数据参数确定矢量型待处理数据的处理方式,对矢量型待处理数据进行处理方式的数据处理。
在一些实施例中,处理器还被设置为实现:对芯片仿真对应的标量型待处理数据进行数据处理,获得第二数据处理结果;
处理器在实现获取矢量执行模块对矢量型待处理数据进行数据处理的第一数据处理结果之后,还被设置为实现:
根据第一数据处理结果和第二数据处理结果,生成仿真结果。
在一些实施例中,处理器还被设置为实现:对矢量型待处理数据和标量型待处理数据的划分粒度进行配置。
请参阅图3,图3是本发明实施例提供的一种芯片仿真装置的示意性框图。该芯片仿真装置300包括处理器301和存储器302,其中,处理器3301和存储器302通过总线连接。
其中,存储器302可以包括非易失性存储介质和内存储器。
非易失性存储介质可存储操作系统和计算机程序。该计算机程序包括程序指令,该程序指令被执行时,可使得处理器执行任意一种芯片仿真方法。
处理器301被设置为提供计算和控制能力,支撑整个终端设备的运行。
内存储器为非易失性存储介质中的计算机程序的运行提供环境,该计算机程序被处理器执行时,可使得处理器执行任意一种芯片仿真方法。
可以理解,图3中示出的结构,仅仅是与本发明实施例方案相关的部分结构的框图,并不构成对本发明实施例方案所应用于其上的芯片仿真装置的限定,具体的芯片仿真装置可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
应当理解的是,处理器301可以是中央处理单元(Central Processing Unit,CPU),该处理器还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。其中,通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
其中,在一些实施例中,处理器被设置为运行存储在存储器中的计算机程序,以实现如下步骤:
获取标量执行模块发送的芯片仿真对应的矢量型待处理数据;
对矢量型待处理数据进行数据处理,获得第一数据处理结果,以供标量执行模块根据第一数据处理结果生成芯片仿真对应的仿真结果。
在一些实施例中,处理器在实现获取标量执行模块发送的芯片仿真对应的矢量型待处 理数据时,被设置为实现:
获取缓存标识符单元中保存的矢量型待处理数据对应的物理地址;
根据物理地址在共享存储单元中进行数据索引,获取矢量型待处理数据,其中,标量执行模块将矢量型待处理数据发送至共享存储单元。
在一些实施例中,处理器在实现对矢量型待处理数据进行数据处理时,被设置为实现:
获取缓存标识符单元中保存的矢量型待处理数据对应的数据参数,其中,标量执行模块将数据参数发送至缓存标识符单元;
根据数据参数确定矢量型待处理数据的处理方式,并对矢量型待处理数据进行处理方式的数据处理。
在一些实施例中,处理器在实现对矢量型待处理数据进行数据处理,获得第一数据处理结果之后,还被设置为实现:
将第一数据处理结果保存至共享存储单元,并将第一数据处理结果对应的物理地址保存至缓存标识符单元。
为了便于理解,以下将结合图1中的芯片仿真系统、图2中的计算机设备和图3中的芯片仿真装置,对本发明的实施例提供的芯片仿真方法进行详细介绍。需知,上述的芯片仿真系统、计算机设备和芯片仿真装置并不构成对本发明实施例提供的芯片仿真方法应用场景的限定。
如图4所示,图4是本发明实施例提供的一种芯片仿真方法的步骤示意流程图。该方法可以用于上述计算机设备中,以实现提高芯片仿真的效率。
具体地,如图4所示,该方法包括步骤S101至步骤S102。
S101、将芯片仿真对应的矢量型待处理数据发送至预置的矢量执行模块。
在进行CPU芯片仿真中,标量和矢量运算对应相应的标量型待处理数据和矢量型待处理数据,在进行标量和矢量运算之前,先将标量型待处理数据和矢量型待处理数据进行区分,将确定的矢量型待处理数据发送至预置的矢量执行模块。其中,矢量执行模块通过在半定制化硬件系统上设计实现,可加速矢量运算的运行。
在一些示例中,矢量执行模块包括但不限于FPGA、GPU等硬件系统,可以提供矢量运算功能。
矢量执行模块在接收获取到矢量型待处理数据后,对矢量型待处理数据进行相应的数据处理。其中,数据处理包括数据时域/频域转换、数据编解码等操作。
在一些实施例中,在步骤S101之前还可以包括:将芯片仿真对应的配置信息发送至矢量执行模块,以供矢量执行模块根据配置信息配置矢量核单元,通过矢量核单元对矢量型待处理数据进行数据处理。
在进行仿真之前,先对矢量执行模块的矢量核单元进行配置,其中,矢量核单元是矢量执行模块的核心单元,专为矢量运算进行优化的硬件执行系统,被设置为对矢量型待处 理数据进行数据处理,可以对CPU仿真中的矢量运算进行加速。
在一些示例中,将CPU芯片仿真对应的配置信息发送至矢量执行模块,矢量执行模块接收到该配置信息后,根据该配置信息配置矢量执行模块的矢量核单元。例如,以配置数字信号处理(Digital Signal Processing,DSP)核执行单元为例,通过发送面向通信应用芯片仿真的DSP核配置信息至矢量执行模块,矢量执行模块根据配置信息构建DSP核执行单元,被设置为处理移动通信中大规模的矢量运算。
在一些示例中,在确定CPU芯片仿真的标矢量部署之后,根据部署情况确定矢量执行模块中矢量核单元的构建方式,并发送相应的脚本文件至矢量执行模块。矢量执行模块在接收到脚本文件后,根据该脚本文件配置矢量核单元。
在矢量核单元配置完成后,矢量执行模块反馈相应的配置完成信息,接收到该配置完成信息后,开始进行CPU芯片仿真,将相应的矢量型待处理数据发送至矢量执行模块,矢量执行模块在接收获取到矢量型待处理数据后,通过配置好的矢量核单元对矢量型待处理数据进行相应的数据处理,获得对应的数据处理结果。为了便于描述,下文把对矢量型待处理数据进行数据处理获得的数据处理结果称为第一数据处理结果。
S102、获取矢量执行模块对矢量型待处理数据进行数据处理的第一数据处理结果,以根据第一数据处理结果生成芯片仿真对应的仿真结果。
通过矢量核单元对矢量型待处理数据进行相应的数据处理,获得第一数据处理结果后,获取该第一数据处理结果。例如,在进行数据处理获得第一数据处理结果后,矢量执行模块将第一数据处理结果返回,直接接收获取到矢量执行模块返回的第一数据处理。
又如,在进行数据处理获得第一数据处理结果后,矢量执行模块将第一数据处理结果保存至相应的存储装置,因此,通过查询存储装置即可获得第一数据处理结果。
第一数据处理结果作为进行CPU芯片仿真获得的中间信息,基于第一数据处理结果,确定本次CPU芯片仿真是否完成,并生成对应的仿真结果。
在一些实施例中,将矢量型待处理数据发送至矢量执行模块的同时,对相应的标量型待处理数据进行数据处理,获得标量型待处理数据对应的数据处理结果。为了便于描述,下文把对标量型待处理数据进行数据处理获得的数据处理结果称为第二数据处理结果。
也即,一边通过矢量执行模块对矢量型待处理数据进行数据处理,一边通过自身对标量型待处理数据进行数据处理,并获得矢量型待处理数据对应的第一数据处理结果,以及标量型待处理数据对应的第二数据处理结果。
根据获得的第一数据处理结果和第二数据处理结果,对第一数据处理结果和第二数据处理结果进行综合分析,生成CPU芯片仿真对应的仿真结果。
在一些实施例中,矢量执行模块还包括共享存储单元,被设置为存储待处理数据以及待处理数据对应的数据处理结果。将芯片仿真对应的矢量型待处理数据发送至预置的矢量执行模块可以包括:将矢量型待处理数据发送至共享存储单元,以供矢量核单元从共享存 储单元中获取矢量型待处理数据;获取矢量执行模块对矢量型待处理数据进行数据处理的第一数据处理结果可以包括:获取共享存储单元中保存的第一数据处理结果。
通过将矢量型待处理数据发送至矢量执行模块的共享存储单元,由共享存储单元保存矢量型待处理数据,矢量核单元可以直接从共享存储单元中获取保存的矢量型待处理数据,对获取到的矢量型待处理数据进行数据处理,获得对应的第一数据处理结果,并将该第一数据处理结果保存至共享存储单元。之后,即可通过访问共享存储单元获取矢量型待处理数据对应的第一数据处理结果。
在一些实施例中,矢量执行模块还包括缓存标识符单元,缓存标识符单元可被设置为存储矢量型待处理数据对应的数据参数,其中,数据参数包括但不限于数据大小、数据属性、数据操作类型等等。在将矢量型待处理数据发送至共享存储单元的同时,将矢量型待处理数据对应的数据参数发送至缓存标识符单元。矢量核单元从共享存储单元中获取保存的矢量型待处理数据,并从缓存标识符单元中获取矢量型待处理数据对应的数据参数,根据矢量型待处理数据对应的数据参数,确定出矢量型待处理数据的处理方式,然后对矢量型待处理数据进行该处理方式的数据处理。
在一些实施例中,将共享存储单元中存储的矢量型待处理数据对应的物理地址保存于缓存标识符单元中。矢量核单元通过获取缓存标识符单元中保存的矢量型待处理数据对应的物理地址,即可根据物理地址在共享存储单元中进行数据索引,获取对应的矢量型待处理数据,进而对矢量型待处理数据进行数据处理。也即,通过物理地址映射的共享内存方式进数据传输,降低了数据搬移次数,从而提升了仿真的运行速度。
在一些实施例中,也可以将矢量型待处理数据对应的数据参数、以及矢量型待处理数据一起发送至共享存储单元,在共享存储单元中关联保存矢量型待处理数据、以及矢量型待处理数据对应的数据参数。矢量核单元可以从共享存储单元中获取关联保存的矢量型待处理数据、以及矢量型待处理数据对应的数据参数,根据矢量型待处理数据对应的数据参数,确定出矢量型待处理数据的处理方式,然后对矢量型待处理数据进行该处理方式的数据处理。
在一些实施例中,定时或在当前达到预设条件时,对矢量型待处理数据和标量型待处理数据的划分粒度进行配置,也即对标量和矢量划分粒度进行优化设置。例如,如果每次缓存标识符单元调用消耗的时间相同,则整合标量和矢量流程,优化标量和矢量划分粒度,减少缓存标识符单元的调用次数。
如图5所示,图5是本发明的又一实施例提供的一种芯片仿真方法的示意流程图。该芯片仿真方法可以应用于芯片仿真装置中,包括步骤S201和步骤S202。
S201、获取标量执行模块发送的芯片仿真对应的矢量型待处理数据。
在进行CPU芯片仿真中,进行标量和矢量运算之前,先通过计算机设备中配置的标量执行模块将标量型待处理数据和矢量型待处理数据进行区分,将确定的矢量型待处理数据 发送至芯片仿真装置。其中,标量执行模块通过高级语言设计实现。在一些示例中,芯片仿真装置中预置有矢量执行模块,其中,矢量执行模块通过在半定制化硬件系统上设计实现,可加速矢量运算的运行。在一些示例中,矢量执行模块包括但不限于FPGA、GPU等硬件系统,可以提供矢量运算功能。芯片仿真装置通过矢量执行模块接收获取到标量执行模块发送的矢量型待处理数据。
在一些示例中,矢量执行模块包括矢量核单元,通过矢量核单元接收获取到标量执行模块发送的矢量型待处理数据。
在一些实施例中,获取标量执行模块发送的芯片仿真对应的矢量型待处理数据可以包括:获取缓存标识符单元中保存的矢量型待处理数据对应的物理地址;根据物理地址在共享存储单元中进行数据索引,获取矢量型待处理数据,其中,标量执行模块将矢量型待处理数据发送至共享存储单元。
其中,矢量执行模块还包括共享存储单元和缓存标识符单元,共享存储单元中存储矢量型待处理数据,缓存标识符单元存储矢量型待处理数据对应的物理地址。通过获取缓存标识符单元中保存的矢量型待处理数据对应的物理地址,之后基于矢量型待处理数据对应的物理地址在共享存储单元中进行数据索引,获取对应的矢量型待处理数据。也即,通过物理地址映射的共享内存方式进数据传输,降低了数据搬移次数,从而提升了仿真的运行速度。
S202、对矢量型待处理数据进行数据处理,获得第一数据处理结果,以供标量执行模块根据第一数据处理结果生成芯片仿真对应的仿真结果。
在获取到矢量型待处理数据后,对矢量型待处理数据进行数据处理,获得矢量型待处理数据对应的数据处理结果。为了便于描述,下文将矢量型待处理数据对应的数据处理结果称为第一数据处理结果。
在一些实施例中,对矢量型待处理数据进行数据处理可以包括:获取缓存标识符单元中保存的矢量型待处理数据对应的数据参数,其中,标量执行模块将数据参数发送至缓存标识符单元;根据数据参数确定矢量型待处理数据的处理方式,并对矢量型待处理数据进行处理方式的数据处理。
其中,矢量型待处理数据对应的数据参数包括但不限于数据大小、数据属性、数据操作类型等等。缓存标识符单元还可被设置为存储矢量型待处理数据对应的数据参数。标量执行模块在将矢量型待处理数据发送至共享存储单元的同时,将矢量型待处理数据对应的数据参数发送至缓存标识符单元。通过从共享存储单元中获取保存的矢量型待处理数据,并从缓存标识符单元中获取矢量型待处理数据对应的数据参数,根据矢量型待处理数据对应的数据参数,确定出矢量型待处理数据的处理方式,然后对矢量型待处理数据进行该处理方式的数据处理。
在一些实施例中,对矢量型待处理数据进行数据处理,获得第一数据处理结果之后可 以包括:将第一数据处理结果保存至共享存储单元,并将第一数据处理结果对应的物理地址保存至缓存标识符单元。
在进行数据处理获得矢量型待处理数据对应的第一数据处理结果后,并不直接将该第一数据处理结果返回至标量执行模块,而是将第一数据处理结果保存至共享存储单元,并将第一数据处理结果保存于共享存储单元对应的物理地址,保存至缓存标识符单元。之后,标量执行模块通过获取缓存标识符单元中保存的第一数据处理结果对应的物理地址,在共享存储单元中进行数据索引,即可获取到该第一数据处理结果。
标量执行模块根据该第一数据处理结果生成CPU芯片仿真对应的仿真结果。在一些示例中,标量执行模块对相应的标量型待处理数据进行数据处理,获得标量型待处理数据对应的数据处理结果。为了便于描述,下文把对标量型待处理数据进行数据处理获得的数据处理结果称为第二数据处理结果。也即,一边通过矢量执行模块对矢量型待处理数据进行数据处理,一边通过标量执行模块对标量型待处理数据进行数据处理,并获得矢量型待处理数据对应的第一数据处理结果,以及标量型待处理数据对应的第二数据处理结果。标量执行模块根据获得的第一数据处理结果和第二数据处理结果,对第一数据处理结果和第二数据处理结果进行综合分析,生成CPU芯片仿真对应的仿真结果。
如图6所示,CPU芯片仿真的详细流程如下:
Step1:标量执行模块发送配置信息至矢量执行模块;
Step2:矢量执行模块根据配置信息,完成矢量核单元配置;
Step3:标量执行模块不断地更新发送矢量型待处理数据、以及对应的数据参数;
Step4:矢量核单元不断地获取矢量型待处理数据、以及对应的数据参数;
Step5:矢量核单元对矢量型待处理数据进行数据处理,并反馈结果至标量执行模块;
Step6:标量执行模块根据反馈结果,确定仿真是否完成;若是,则执行Step7;反之,则返回执行Step3;
Step7:输出仿真结果。
上述实施例中通过将芯片仿真对应的矢量型待处理数据发送至预置的矢量执行模块,通过矢量执行模块对矢量型待处理数据进行数据处理,获得相应的第一数据处理结果,基于第一数据处理结果,生成芯片仿真对应的仿真结果,也即,将芯片仿真中的矢量运算分离出来进行独立处理,因而提高了芯片仿真的效率。
本发明的实施例中还提供一种计算机可读存储介质,计算机可读存储介质存储有计算机程序,计算机程序中包括程序指令,处理器执行程序指令,实现本发明实施例提供的任一项芯片仿真方法。
例如,该计算机程序被处理器加载,可以执行如下步骤:
将芯片仿真对应的矢量型待处理数据发送至预置的矢量执行模块;
获取矢量执行模块对矢量型待处理数据进行数据处理的第一数据处理结果,以根据第 一数据处理结果生成芯片仿真对应的仿真结果。
本发明实施例公开了一种芯片仿真方法、装置、设备及存储介质,通过将芯片仿真对应的矢量型待处理数据发送至预置的矢量执行模块,通过矢量执行模块对矢量型待处理数据进行数据处理,获得相应的第一数据处理结果,基于第一数据处理结果,生成芯片仿真对应的仿真结果,也即,将芯片仿真中的矢量运算分离出来进行独立处理,因而提高了芯片仿真的效率。
其中,计算机可读存储介质可以是前述实施例的芯片仿真系统的内部存储单元,例如芯片仿真系统的硬盘或内存。计算机可读存储介质也可以是芯片仿真系统的外部存储设备,例如芯片仿真系统上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。
以上,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到各种等效的修改或替换,这些修改或替换都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应以权利要求的保护范围为准。

Claims (14)

  1. 一种芯片仿真方法,包括:
    将芯片仿真对应的矢量型待处理数据发送至预置的矢量执行模块;
    获取所述矢量执行模块对所述矢量型待处理数据进行数据处理的第一数据处理结果,以根据所述第一数据处理结果生成芯片仿真对应的仿真结果。
  2. 根据权利要求1所述的芯片仿真方法,其中,所述将芯片仿真对应的矢量型待处理数据发送至预置的矢量执行模块之前,包括:
    将所述芯片仿真对应的配置信息发送至所述矢量执行模块,以供所述矢量执行模块根据所述配置信息配置矢量核单元,通过所述矢量核单元对所述矢量型待处理数据进行数据处理。
  3. 根据权利要求2所述的芯片仿真方法,其中,所述矢量执行模块包括共享存储单元,所述将芯片仿真对应的矢量型待处理数据发送至预置的矢量执行模块,包括:
    将所述矢量型待处理数据发送至所述共享存储单元,以供所述矢量核单元从所述共享存储单元中获取所述矢量型待处理数据;
    所述获取所述矢量执行模块对所述矢量型待处理数据进行数据处理的第一数据处理结果,包括:
    获取所述共享存储单元中保存的所述第一数据处理结果。
  4. 根据权利要求3所述的芯片仿真方法,其中,所述矢量执行模块还包括缓存标识符单元,所述方法还包括:
    将所述矢量型待处理数据对应的数据参数发送至所述缓存标识符单元,以供所述矢量核单元从所述缓存标识符单元获取所述数据参数,并根据所述数据参数确定所述矢量型待处理数据的处理方式,对所述矢量型待处理数据进行所述处理方式的数据处理。
  5. 根据权利要求1至4任一项所述的芯片仿真方法,还包括:
    对所述芯片仿真对应的标量型待处理数据进行数据处理,获得第二数据处理结果;
    所述获取所述矢量执行模块对所述矢量型待处理数据进行数据处理的第一数据处理结果之后,包括:
    根据所述第一数据处理结果和所述第二数据处理结果,生成所述仿真结果。
  6. 根据权利要求5所述的芯片仿真方法,还包括:
    对所述矢量型待处理数据和所述标量型待处理数据的划分粒度进行配置。
  7. 一种芯片仿真方法,包括:
    获取标量执行模块发送的芯片仿真对应的矢量型待处理数据;
    对所述矢量型待处理数据进行数据处理,获得第一数据处理结果,以供所述标量执行模块根据所述第一数据处理结果生成所述芯片仿真对应的仿真结果。
  8. 根据权利要求7所述的芯片仿真方法,其中,所述获取标量执行模块发送的芯片 仿真对应的矢量型待处理数据,包括:
    获取缓存标识符单元中保存的所述矢量型待处理数据对应的物理地址;
    根据所述物理地址在共享存储单元中进行数据索引,获取所述矢量型待处理数据,其中,所述标量执行模块将所述矢量型待处理数据发送至所述共享存储单元。
  9. 根据权利要求8所述的芯片仿真方法,其中,所述对所述矢量型待处理数据进行数据处理,包括:
    获取所述缓存标识符单元中保存的所述矢量型待处理数据对应的数据参数,其中,所述标量执行模块将所述数据参数发送至所述缓存标识符单元;
    根据所述数据参数确定所述矢量型待处理数据的处理方式,并对所述矢量型待处理数据进行所述处理方式的数据处理。
  10. 根据权利要求8或9所述的芯片仿真方法,其中,所述对所述矢量型待处理数据进行数据处理,获得第一数据处理结果之后,包括:
    将所述第一数据处理结果保存至所述共享存储单元,并将所述第一数据处理结果对应的物理地址保存至所述缓存标识符单元。
  11. 一种计算机设备,所述计算机设备包括存储器和处理器;其中,
    所述存储器被设置为存储计算机程序;
    所述处理器,被设置为执行所述计算机程序并在执行所述计算机程序时实现如权利要求1至6中任一项所述的芯片仿真方法。
  12. 一种芯片仿真装置,包括存储器和处理器;其中,
    存储器被设置为存储计算机程序;
    处理器,被设置为执行计算机程序并在执行计算机程序时实现如权利要求7至10中任一项的芯片仿真方法。
  13. 一种芯片仿真系统,包括如权利要求11的计算机设备和如权利要求12的芯片仿真装置,其中,所述计算机设备与所述芯片仿真装置通信连接。
  14. 一种计算机可读存储介质,存储有计算机程序,其中,所述计算机程序被处理器执行时使所述处理器实现如权利要求1至6中任一项所述的芯片仿真方法;或者实现如权利要求7至10中任一项的芯片仿真方法。
PCT/CN2021/089017 2020-06-28 2021-04-22 芯片仿真方法、装置、设备、系统及存储介质 WO2022001317A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
EP21832822.7A EP4170538A4 (en) 2020-06-28 2021-04-22 CHIP SIMULATION METHOD, DEVICE AND SYSTEM AS WELL AS DEVICE AND STORAGE MEDIUM

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010599906.5 2020-06-28
CN202010599906.5A CN113849951A (zh) 2020-06-28 2020-06-28 芯片仿真方法、装置、设备、系统及存储介质

Publications (1)

Publication Number Publication Date
WO2022001317A1 true WO2022001317A1 (zh) 2022-01-06

Family

ID=78972551

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/089017 WO2022001317A1 (zh) 2020-06-28 2021-04-22 芯片仿真方法、装置、设备、系统及存储介质

Country Status (3)

Country Link
EP (1) EP4170538A4 (zh)
CN (1) CN113849951A (zh)
WO (1) WO2022001317A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117236263A (zh) * 2023-11-15 2023-12-15 之江实验室 一种多芯粒互联仿真方法、装置、存储介质及电子设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102141951A (zh) * 2010-11-25 2011-08-03 华为技术有限公司 芯片仿真系统及方法
CN106841974A (zh) * 2016-12-13 2017-06-13 深圳市紫光同创电子有限公司 一种fpga测试平台及方法
CN108038328A (zh) * 2017-12-24 2018-05-15 苏州赛源微电子有限公司 芯片自动仿真验证系统
US10140161B1 (en) * 2017-04-28 2018-11-27 EMC IP Holding Company LLC Workload aware dynamic CPU processor core allocation
CN111025134A (zh) * 2019-12-30 2020-04-17 北京自动测试技术研究所 一种片上系统芯片的测试方法及系统

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102141951A (zh) * 2010-11-25 2011-08-03 华为技术有限公司 芯片仿真系统及方法
CN106841974A (zh) * 2016-12-13 2017-06-13 深圳市紫光同创电子有限公司 一种fpga测试平台及方法
US10140161B1 (en) * 2017-04-28 2018-11-27 EMC IP Holding Company LLC Workload aware dynamic CPU processor core allocation
CN108038328A (zh) * 2017-12-24 2018-05-15 苏州赛源微电子有限公司 芯片自动仿真验证系统
CN111025134A (zh) * 2019-12-30 2020-04-17 北京自动测试技术研究所 一种片上系统芯片的测试方法及系统

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4170538A4 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117236263A (zh) * 2023-11-15 2023-12-15 之江实验室 一种多芯粒互联仿真方法、装置、存储介质及电子设备
CN117236263B (zh) * 2023-11-15 2024-02-06 之江实验室 一种多芯粒互联仿真方法、装置、存储介质及电子设备

Also Published As

Publication number Publication date
EP4170538A4 (en) 2023-12-27
CN113849951A (zh) 2021-12-28
EP4170538A1 (en) 2023-04-26

Similar Documents

Publication Publication Date Title
US11960431B2 (en) Network-on-chip data processing method and device
CN103559156B (zh) 一种fpga与计算机之间的通信系统
TW201903620A (zh) 在改良式內部積體電路匯流排拓撲中從屬對從屬之通信
WO2021103580A1 (zh) 自动驾驶应用程序在不同开发平台间对接的方法
US20210004685A1 (en) Techniques to manage training or trained models for deep learning applications
WO2022143536A1 (zh) 基于APSoC的国密计算方法、系统、设备及介质
WO2020102928A1 (zh) 一种无线信号发送方法、无线信号发送装置及终端设备
CN104050067A (zh) Fpga在mcu芯片中工作的方法和装置
WO2022001317A1 (zh) 芯片仿真方法、装置、设备、系统及存储介质
EP3767482A2 (en) Packet transmission method and apparatus
US8806078B2 (en) Information processing device and program product
US11625348B2 (en) Transfer device, information processing device, and data transfer method
CN116136790A (zh) 任务处理方法和装置
CN110856195B (zh) 射频组件的配置系统及方法
Zhang et al. Research on development of embedded uninterruptable power supply system for IOT-based mobile service
WO2022260173A1 (ja) 情報処理装置、情報処理方法、プログラムおよび情報処理システム
CN112396186B (zh) 执行方法、装置及相关产品
CN115103032B (zh) 通信协议控制电路和芯片
US9336011B2 (en) Server and booting method
WO2023045478A1 (zh) 图任务调度方法、执行端设备、存储介质及程序产品
CN103257940B (zh) 一种片上系统SoC写数据的方法及装置
EP4261734A1 (en) Automatic configuration of pipeline modules in an electronics system
WO2020124948A1 (zh) 网络离线模型的处理方法、人工智能处理装置及相关产品
CN117909742A (zh) 一种用于异构硬件集群的大模型分布式训练方法和系统
Neri et al. Design and Implementation of a Pipelined RV32IMC Processor with Interrupt Support for Large-Scale Wireless Sensor Networks

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21832822

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2021832822

Country of ref document: EP

Effective date: 20230118

NENP Non-entry into the national phase

Ref country code: DE