CN118210242A

CN118210242A - Semi-physical OpenCL model simulation method and device and computing equipment

Info

Publication number: CN118210242A
Application number: CN202410320231.4A
Authority: CN
Inventors: 曾子强; 康烁
Original assignee: Zhejiang Dijie Software Technology Co ltd
Current assignee: Zhejiang Dijie Software Technology Co ltd
Priority date: 2024-03-20
Filing date: 2024-03-20
Publication date: 2024-06-18

Abstract

The application relates to the field of simulation, and particularly discloses a semi-physical OpenCL model simulation method, a semi-physical OpenCL model simulation device and computing equipment. The semi-physical OpenCL model simulation method comprises the following steps: and creating a memory model at a simulation abstract layer at the side of the target machine and loading the memory model through a corresponding process. And creating corresponding processes and threads at a simulation abstract layer at one side of the pipeline to monitor the data updated by the memory model, analyzing the equipment by a scheduler to distinguish various data sources and perform distribution operation, and distributing and scheduling simulation parameters and results after process interaction of inter-process communication to one side of the target machine by distributing a cache area and threads. And creating a process compatible with the display card driving operation and capable of communicating with the pipeline at the simulation abstract layer at the client side. Based on the application, the problem of complex equipment attributes of the GPU can be avoided, and virtual simulation of the OpenCL logic function of the low-burden GPU based on the virtual simulation platform can be realized.

Description

Semi-physical OpenCL model simulation method and device and computing equipment

Technical Field

The present application relates to the field of simulation, and in particular, to a method, an apparatus, and a computing device for OpenCL model-based simulation.

Background

Digital simulation is widely applied in the research and development process in the fields of aerospace, aviation, automobiles and the like. A large amount of simulation data, such as simulated models, parameter configuration, simulation results and the like, can be generated in the simulation experiments, and are valuable technical accumulation and assets, particularly simulation result data. In the field of digital simulation, the difficulty of the simulation of the GPU is mainly represented by the complex functional logic of the GPU: in one aspect, the GPU includes a plurality of logic modules, and one GPU includes a graphics memory controller, a compression unit, a BIOS, a graphics and computing array, a bus interface, a power management unit, a video management unit, a display interface, and the like. The hardware behavior level simulation is carried out on the complex functional logic module, and the difficulty and the workload are both great. On the other hand, the parallel computing power of GPUs is currently almost impossible to implement in a common device simulation manner.

In the prior art, even though the related functions of the GPU equipment are realized by adopting the simulation of the API level, the technology is limited to a specific hardware platform, the universality is poor, and the GPU application of other embedded systems cannot be compatible.

Therefore, a new simulation solution is needed to solve the drawbacks of the prior art.

Disclosure of Invention

The invention aims to provide a semi-physical OpenCL model simulation method, a semi-physical OpenCL model simulation device and a computing device, which aim to be compatible with OpenCL function simulation of different embedded systems on the basis of semi-physical simulation, and the problem of complex equipment attributes of a GPU can be avoided by adopting a multi-port communication mode, and virtual simulation of the OpenCL logic function of a low-burden GPU based on a virtual simulation platform can be realized.

According to one aspect of the present application, there is provided a semi-physical OpenCL model simulation method, including: and respectively establishing simulation abstraction layers of the target machine, the pipeline and the client, and adopting a static/dynamic coding technology to cooperate with the driving abstraction layers. The abstraction layer is based on modular construction and is loaded by the virtual simulation platform in the form of a dynamic library. And creating a memory model at a simulation abstract layer at the side of the target machine and loading the memory model through a corresponding process. And creating corresponding processes and threads at a simulation abstract layer at one side of the pipeline to monitor the data updated by the memory model, analyzing the equipment by a scheduler to distinguish various data sources and perform distribution operation, and distributing and scheduling simulation parameters and results after process interaction of inter-process communication to one side of the target machine by distributing a cache area and threads. The process interaction of the inter-process communication includes: and the pipeline accesses and forwards the data packet created by the coding packet protocol, wherein the coding packet protocol adopts API data flow and lightweight description to code the data packet. Lightweight description packet encoding includes the encoding of the packet into an interface, the overall length, and the format of the parameters passed to the interface. After the pipeline acquires the data pack coding data, the multi-process/multi-thread/multi-machine communication is realized through a transmission mechanism of TCP/UDP. Distributing the buffer area and the threads to distribute data comprises the steps of sending contact signals of the touch screen device to a pipeline, and distributing the contact signals through a dispatcher of the pipeline to realize the scene of picture display and touch screen interaction. And creating a process compatible with the display card driving operation and capable of communicating with the pipeline at the simulation abstract layer at the client side, receiving data of the process communicating between the processes by the model to perform synchronous parallel simulation operation, and returning corresponding simulation parameters and results to the pipeline side after the simulation operation. And the display card drives to load the simulation operation result at one side of the client and outputs the simulation operation result to the corresponding display peripheral.

Optionally, in the method according to the present application, allocating the buffer and the thread to distribute and schedule the simulation parameters and results after the process interaction via the inter-process communication includes: and the A module encodes the data group package and transmits the data to the B module through an IPC protocol. And B, receiving corresponding data and decoding the data through the IPC protocol, and decoding, analyzing and processing the received data. And outputting the simulation operation result to the corresponding display peripheral.

Optionally, in the method according to the present application, performing decoding analysis processing on the received data and outputting the simulation operation result to the corresponding display peripheral device includes: and carrying out interface analysis on the data packets of all the OpenCL API functions one by one. And according to the decoding result, calling the display card driver of the local machine to realize the driver interface corresponding to the OpenCL. If the API needs to return a value or input events generated by some input devices are captured, the result or event information of the execution of the local calling API is transmitted back to the target machine through a pipeline. Host processor CPU and other hardware devices of the OpenCL device are emulated. Parallel models simulating OpenCL, including task parallelism and data parallelism, use multithreading or other means to implement parallel operations. The event mechanism of the OpenCL is simulated and used for synchronously controlling and mutually exclusive parallel flows.

Optionally, in the method according to the present application, implementing the driving interface corresponding to OpenCL includes analyzing and processing the memory model, the kernel model, the thread model, the command queue, and the synchronization mechanism model. The processing kernel model comprises the steps of analyzing an OpenCL kernel function, creating and destroying an OpenCL object and processing an OpenCL event. The processing of the memory model comprises the operations of allocating and releasing the memory, performing read-write operation on the memory, mapping and canceling the memory object, creating and releasing the memory buffer area object and performing operation on the local memory of the working group. The processing thread model includes processing work items, work groups, and kernel thread assignments. Processing the command queue includes parsing the command queue in the OpenCL application, simulating the execution of the instruction. The process synchronization mechanism model includes process event synchronization, and synchronization of process work items and work groups.

Alternatively, in the method according to the present application, the a module and the B module exist in the same process, and the B module is a thread of the a module. Or the A module and the B module are used as two independent processes, and the A module and the B module run on a physical machine in parallel. Or the A module and the B module respectively run on two physical machines supporting remote communication, and the A module is operated as a separate process to encode the data packet and send the data to the B module through an IPC protocol.

Optionally, in the method according to the present application, the calling the result or event information of the API execution includes: and if the parameter is of a numerical value type, directly writing the interface code and the parameter into the equipment memory. And if the parameter is the type of the transfer array data, directly writing the interface code and the parameter into the equipment memory, additionally recording the length of the array data, and then writing the array data completely. When there is a return value, the interface code and parameters are directly written into the device memory and the loop of read back is called to wait for the return value of the host side.

Optionally, in the method according to the present application, performing decoding analysis processing on the data includes: and decoding the functions, and carrying out interface analysis on the data packets of all the OpenCL API functions one by one. And the simulation device simulates a host processor CPU and other hardware devices of the OpenCL device. And (5) a parallel operation framework. And processing a collaborative synchronization mechanism, simulating an event mechanism of OpenCL, and performing synchronous control and mutual exclusion on the parallel flow.

According to still another aspect of the present application, there is provided a semi-physical OpenCL model simulation apparatus, including a target machine, a pipe, and a client: and respectively establishing simulation abstraction layers of the target machine, the pipeline and the client on corresponding equipment, and adopting a static/dynamic coding technology to cooperate with the driving abstraction layers. The abstraction layer is based on modular construction and is loaded by the virtual simulation platform in the form of a dynamic library. And creating a memory model at a simulation abstract layer at the side of the target machine and loading the memory model through a corresponding process. And creating corresponding processes and threads at a simulation abstract layer at one side of the pipeline to monitor the data updated by the memory model, analyzing the equipment by a scheduler to distinguish various data sources and perform distribution operation, and distributing and scheduling simulation parameters and results after process interaction of inter-process communication to one side of the target machine by distributing a cache area and threads. The process interaction of the inter-process communication includes: and the pipeline accesses and forwards the data packet created by the coding packet protocol, wherein the coding packet protocol adopts API data flow and lightweight description to code the data packet. Lightweight description packet encoding includes the encoding of the packet into an interface, the overall length, and the format of the parameters passed to the interface. After the pipeline acquires the data pack coding data, the multi-process/multi-thread/multi-machine communication is realized through a transmission mechanism of TCP/UDP. Distributing the buffer area and the threads to distribute data comprises the steps of sending contact signals of the touch screen device to a pipeline, and distributing the contact signals through a dispatcher of the pipeline to realize the scene of picture display and touch screen interaction. And creating a process compatible with the display card driving operation and capable of communicating with the pipeline at the simulation abstract layer at the client side, receiving data of the process communicating between the processes by the model to perform synchronous parallel simulation operation, and returning corresponding simulation parameters and results to the pipeline side after the simulation operation. And the display card drives to load the simulation operation result at one side of the client and outputs the simulation operation result to the corresponding display peripheral.

According to yet another aspect of the present application, there is provided a computing device comprising: one or more processor memories. One or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs comprising instructions for performing any of the methods described above.

According to yet another aspect of the present application, there is provided a computer readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by a computing device, cause the computing device to perform any of the methods described above.

In summary, according to the scheme of the application, based on semi-physical OpenCL simulation, the purpose of OpenCL function simulation compatible with different embedded systems is to adopt a multi-port communication mode, so that the problem of complex equipment attributes of the GPU can be avoided, and virtual simulation of the OpenCL logic function of the low-burden GPU based on a virtual simulation platform can be realized. Compared with the prior art, the method has low-energy compatibility with the software layer/application layer for running any graphic application, is easy to expand, and can quickly realize simulation of various GPU (graphic processing unit) equipment without generating additional work expenditure.

Compared with the prior art, the invention has the following advantages:

the advantages are as follows: the expansibility is good and the flexibility is high.

The invention is based on interface simulation, only needs to realize the standard library versions of graphics such as OpenCL and the like of static coding and dynamic coding corresponding to a target GPU, and because both interface specifications have downward compatibility, under the condition that most interfaces are realized, one GPU simulation module is realized again, and only a small number of interfaces are required to be simulated. Meanwhile, the whole simulation model is constructed based on modularization, is very flexible and is easy to expand.

The advantages are as follows: the performance is strong.

The method has the essence that the GPU equipment of the host machine is directly called to finish the function realization of the target GPU, and compared with the traditional processor equipment simulation model, the operation efficiency of the GPU equipment directly called to the host machine is definitely higher than the performance efficiency of the simulation GPU equipment.

The advantages are as follows: the universality is good.

According to the technical scheme, the plug-in units are plugged into each component machine, and fine adjustment can be performed according to actual service scenes so as to meet the differentiation requirements of different services. The universality is good, and the adaptability is high.

Drawings

FIG. 1 is a schematic diagram of the software and hardware emulation framework of the present invention;

FIG. 2 is a diagram of an API code format according to the present invention;

FIG. 3 is a schematic diagram of the pipe communication principle of the present invention;

Fig. 4 is a schematic diagram of the pipe communication principle between different machines of the present invention.

Detailed Description

In order that the above-recited objects, features and advantages of the present application will become more readily apparent, a more particular description of the application will be rendered by reference to the appended drawings and appended detailed description. The embodiments described herein are specific embodiments of the present application, which are intended to be illustrative and exemplary of the inventive concept, and should not be construed as limiting the scope of the application and embodiments of the application. In addition to the embodiments described herein, those skilled in the art can adopt other obvious solutions based on the disclosure of the claims and specification, including those adopting any obvious substitutions and modifications to the embodiments described herein.

The drawings in the present specification are schematic diagrams, assist in explaining the concept of the present invention, and schematically represent interrelationships of the parts.

First, fig. 1 shows the software and hardware structure and principle of OpenCL simulation based on semi-physical simulation, and the meaning of terms related to the present invention is explained as follows in connection with fig. 1:

Open computing language (OpenCL): the open computing language (full: open Computing Language, abbreviated: openCL) is an open, royalty-free standard for cross-platform parallel programming of various accelerators in supercomputers, cloud servers, personal computers, mobile devices, and embedded platforms. OpenCL greatly enhances the speed and responsiveness of a wide range of applications in numerous market categories, including professional creative tools, scientific and medical software, vision processing, and neural network training and reasoning. OpenCL consists of a language (C99-based) in which kernel is written and a set of APIs for defining and controlling the platform, mainly for parallel computing.

Graphics Processor (GPU): graphics processor (English: graphics processing unit, GPU for short), also called display core, vision processor, display chip, is a microprocessor which is specially used for making image and graphic related operation on personal computer, workstation, game machine and some mobile equipment (such as tablet computer and intelligent mobile phone, etc.).

Application Programming Interface (API): an application programming interface (english: application Programming Interface, abbreviated: API), is a number of predefined functions that are designed to provide the ability for applications and developers to access a set of routines based on certain software or hardware without having to access source code or understand the details of the internal operating mechanisms.

Inter-process communication (IPC): inter-process communication (English: inter-Process Communication, IPC for short) is a set of programming interfaces, which enable programmers to coordinate different processes, enable the programmers to run in an operating system at the same time, and transfer and exchange information with each other. This enables one program to handle the requirements of many users at the same time.

Transmission Control Protocol (TCP): the transmission control protocol (English: transmission Control Protocol, abbreviated as TCP) is a connection-oriented, reliable, byte-stream-based transport layer communication protocol, defined by IETF RFC 793[1 ].

User Datagram Protocol (UDP): the user datagram protocol (english: user Datagram Protocol, abbreviated UDP) is a connectionless transport layer protocol in the open system interconnection (english: open System Interconnection, abbreviated OSI) reference model, providing a transaction-oriented simple unreliable information transfer service, IETF RFC 768[1] is a formal specification for UDP. The protocol number of the UDP in the IP packet is 17..

Basic Input Output System (BIOS): a basic input output system (english: basic Input Output System, abbreviated as BIOS) is a set of programs that are cured onto a ROM chip on a motherboard within a computer.

Application (APP): an Application (APP) refers to a computer program that operates in user mode to interact with a user and has a visual user interface for performing a particular task or tasks.

Semi-physical simulation: semi-physical simulation refers to the fact that a part of a simulated object system is introduced into a simulation loop in a physical (or physical model) mode for simulation research contents; the rest of the simulated object system is described in terms of a mathematical model and is converted into a simulated computing model.

QEMU analog processor: QEMU simulation processor (English: quick Emulator, QEMU for short) is a set of open-source embedded simulator and virtual machine written by Fabrice Belgard (Fabrice Belgard) and distributed with GPL license source code, and is widely used on GNU/Linux platform.

Host side (host): the host end is arranged relative to the client end, takes on the role of a server, provides services for other devices, and can perform remote network communication between the client end and the host end.

Pipe (pipe): pipeline communication (Communication Pipeline, abbreviated as pipe) refers to pipeline communication, namely a sending process sends a large amount of data into a pipeline in the form of character stream, and a receiving process can receive the data from the pipeline, and the pipeline is used for communication by the receiving process.

Client (client): the client is a program which is arranged relative to the host and provides local service for the client, and remote network communication can be carried out between the client and the host.

Operation code (opcode): an Operation Code (in english) is an instruction Code, which refers to a portion of an instruction or field (usually expressed in Code) specified in a computer program to perform an Operation, and is actually an instruction sequence number.

The simulation technology of the GPU equipment can be traced back to the early development stage of the android simulator technology, in order to improve the development efficiency of the android App, google provides an open-source android simulator-goldfish which completely simulates a real ARM processor by means of QEMU, and the purpose of running a complete android operating system on an X86 computer is achieved.

However, goldfish is not used for simulating the hardware behavior level of the GPU device, but only the simulation degree of the application programming interface (API, application Programming Interface) level (OpenCLES) is achieved, wherein the simulated API is mainly a driver of the GPU on the android system, and the OpenCLES library is mainly used. Similarly, an IOS Simulator of apple company also only simulates the IOS operating system at the API level, and simultaneously, combines an Inter-process communication (IPC, inter-Process Communication) mechanism to simulate the application, so that from the aspect of the Simulator, the simulation degree of the whole system is not complete goldfish.

Without exception, goldfish and IOS emulators have quite limitations for the simulation of GPU devices, while based on API simulation, they do not have good extensibility, they are designed for only the android operating system and IOS operating system, and there is a strong coupling with the operating system itself. Because goldfish and IOS subscribers are designed for the purpose of running App. Instead of being a generic GPU simulator, a wide variety of graphics applications may be run.

Furthermore, goldfish and IOS peripherals do not focus on specific functional details of the GPU device, as long as the graphics interface on which the App depends can be emulated. In practice, however, in some embedded systems, if driver layer emulation is to be done, it is necessary to pay attention to the specific GPU device model and the specific functions it supports (although these features are also reflected on APIs, in many cases these APIs do not belong to the standard graphics specifications, and each manufacturer has its own specialized implementation). Obviously, special simulators like goldfish and IOS simulators are not satisfactory.

The prior art has quite limited simulation aiming at GPU equipment, is based on API simulation, does not have better expansibility, is designed only for a specific operating system, and has stronger coupling relation with the operating system. Meanwhile, the simulation of the GPU equipment in the prior art does not pay attention to specific functional details of the GPU equipment, only a graphic interface on which an App depends can be simulated, but in some embedded simulation fields, the model of the GPU equipment and the hardware information are required to be focused, and the simulation of the GPU equipment is difficult to realize in the prior art.

According to the embodiment of the application, the technical core is to provide the GPU simulation method and the system based on the virtual simulation platform, which aim to simulate OpenCL function functions compatible with different embedded systems on the basis of semi-physical OpenCL simulation, and adopt a multi-terminal communication mode, so that the problem of complex equipment attributes of the GPU can be avoided, and virtual simulation of the OpenCL logic function of the low-burden GPU based on the virtual simulation platform can be realized. And moreover, the method has low-energy compatibility with the software layer/application layer to run any graphic application, is easy to expand, and can quickly realize simulation of various GPU (graphic processing unit) equipment without generating additional work expenditure.

Fig. 1 shows a semi-physical OpenCL model simulation apparatus according to the present application. According to the embodiment of the application, the semi-physical OpenCL model simulation device is realized by combining hardware and software. The semi-physical OpenCL model simulation device is suitable for any intelligent machine with a network communication function.

The invention relates to a semi-physical OpenCL model simulation device, which mainly comprises three modules, namely a target machine, a pipeline and a client:

the invention needs to respectively establish corresponding simulation abstract layers on the target machine, the pipeline and the client, and adopts static/dynamic coding technology to drive the abstract layers in a matching way.

The abstraction layer is based on modular construction and is loaded by the virtual simulation platform in the form of a dynamic library. And creating a memory model at a simulation abstract layer at the side of the target machine and loading the memory model through a corresponding process.

And creating corresponding processes and threads at a simulation abstract layer at one side of the pipeline to monitor the data updated by the memory model, analyzing the equipment by a scheduler to distinguish various data sources and perform distribution operation, and distributing and scheduling simulation parameters and results after process interaction of inter-process communication to one side of the target machine by distributing a cache area and threads.

The process interaction of the inter-process communication includes: and the pipeline accesses and forwards the data packet created by the coding packet protocol, wherein the coding packet protocol adopts API data flow and lightweight description to code the data packet. Lightweight description packet encoding includes the encoding of the packet into an interface, the overall length, and the format of the parameters passed to the interface. After the pipeline acquires the data pack coding data, the multi-process/multi-thread/multi-machine communication is realized through a transmission mechanism of TCP/UDP. Distributing the buffer area and the threads to distribute data comprises the steps of sending contact signals of the touch screen device to a pipeline, and distributing the contact signals through a dispatcher of the pipeline to realize the scene of picture display and touch screen interaction.

And creating a process compatible with the display card driving operation and capable of communicating with the pipeline at the simulation abstract layer at the client side, receiving data of the process communicating between the processes by the model to perform synchronous parallel simulation operation, and returning corresponding simulation parameters and results to the pipeline side after the simulation operation. And the display card drives to load the simulation operation result at one side of the client and outputs the simulation operation result to the corresponding display peripheral.

1. Target machine

The target machine participates in the compiling and running link of the target machine program, captures API data, encodes parameters of the API data, and writes the parameters into a corresponding memory space. There are various methods of capturing API data, which can be basically categorized into two types, static and dynamic.

1) Static method: the method is realized by two means, wherein the first method can adopt a compiler preprocessing technology to preprocess GPU drivers contained in the target program, add marks which can be identified by a simulation platform and mark text sections of APIs (application program interfaces) related to OpenCL; the second is to re-encode the lib library of OpenCL, replacing our own implementation of the library that is convenient to interact with skyeye.

The static coding preprocesses GPU driving contained in the target program, and adds marks which can be identified by the simulation platform, namely, a compiler preprocessing technology is adopted to preprocess the target program in a compiling stage; the drive library may also be recoded and replaced with a library that facilitates communication with the virtual simulation platform.

2) Dynamic method: and capturing and encoding the instruction segment of the OpenCL API in the running process of the target program by adopting a dynamic binary compiling technology, and modifying the target program is not required. But this approach is too difficult and requires adaptation to a variety of architectures.

The present invention preferably may employ a second means of static approach to achieve capture of OpenCL APIs. The method is easiest to implement and expand, and meets the expected target.

3) Coding interface

The host needs to encode an interface in two parts: the OpenCL interface and the drive-related interface.

The OpenCL interface needs to rely on a particular API version of the OpenCL, with all interfaces of that version being encoded in their entirety to enable support for this version. It should be noted that, since this part of the interface is fully adapted to a specific version, the OpenCL interface between different versions may have an incompatibility problem, but at the same time, the OpenCL interface can be adapted to OpenCL drivers of the same version of other GPU devices as long as the version interface is implemented completely.

The drive related interface is related to the device model of the OpenCL and comprises a memory model, a kernel model, a thread model, a command queue and a synchronous mechanism model respectively. The specific model detail items are as follows:

a) Kernel model: openCL kernel function parsing, openCL object creation and destruction, and OpenCL event processing.

B) And (3) a memory model: memory allocation and release, memory read-write operation, memory object mapping and cancellation, memory buffer object creation and release and realization of local memory of a working group.

C) Thread model: the thread model for realizing the OpenCL comprises a work item, a work group and a kernel.

Work item, work group: the OpenCL runtime system creates an integer index space, which is a grid of N-dimensional values, N being 1,2 or 3, also called NDRange. The various instances of the execution core are referred to as workitems (workitems). Work items are identified in the entire index space by a global ID, as if the school were identified to students by a number. Work items are organized into work-groups. The global index assigns workgroup IDs to workgroups (as if schools were numbering classes), and local IDs to workitems within workgroups (as if students were numbering classes). Therefore, how to find the work item in a coordinate manner can be through the global index of the work item, or through the work group index number and then through the local index number.

D) Command queue: and analyzing a command queue in the OpenCL application program, and simulating the execution process of the instruction.

E) Synchronization mechanism: and realizing the synchronous mechanism of OpenCL, such as event, fence, atom and other operations.

The synchronization mechanism of work items and work groups is as follows:

Mutex-lock the mutex Q before accessing the post-critical region of the shared resource. And releasing the lock on the exclusive lock guide after the access is completed. After locking the mutex, any other thread attempting to lock the mutex again will be blocked until the lock is released.

Read-write lock—if a thread applies for a read lock, other threads can apply for a read lock again, but cannot apply for a write lock. If a thread applies for a write lock, other threads cannot apply for a read lock or a write lock.

Condition variable—a condition variable is used to automatically block a thread until a special case occurs. Typically both conditional variables and mutex locks are used simultaneously. Two actions of the condition variable: if the condition is not satisfied, the blocked thread notifies the blocked thread to start working.

In synchronization mechanisms, semaphore overview semaphores are widely used for synchronization and mutual exclusion between processes or threads, and are essentially a non-negative integer counter that is used to control access to a common resource. When programming, whether the public resource has access authority is judged according to the result of the operation signal value, when the signal value is greater than 0, the public resource can be accessed, and otherwise, the public resource is blocked. The PV primitive is an operation on the semaphore, one P operation decrements the semaphore by 1, and one V operation increments the semaphore by 1.

4) Encoding format fig. 2 shows the encoding format of the OpenCL interface of the present invention. Performing packet encoding includes encapsulating the packet into an encoding of the interface, a total length, and a format of the parameters passed to the interface. Wherein the interface code has a special return value type for servicing certain types of interfaces with return values.

5) The coding process has different processing modes for different types of interfaces. The method comprises the following steps:

1. and (3) no return value is generated, and the parameter is of a numerical value type, namely the interface code and the parameter are directly written into the equipment memory.

2. The parameter is the type of transmitting array data, based on condition 1, the length of the array data is additionally recorded, and then the array data is written in completely

3. Has a return value that on the basis of the condition 1, call readback loops to wait for the return value of the Host end

2. Pipeline

The pipeline acquires data at the host end through the memory space of the virtual simulation platform and stores the data into the sending cache, and the sending thread traverses the data in the cache and sends the data out through IPC; the receiving thread stores the data into the receiving buffer memory through the IPC communication mechanism, the dispatcher distributes the data, and the data are respectively written into the memory space of the client according to the interface data with the return value, the common interface data and the data of the equipment model.

FIG. 3 illustrates the pipeline communication principle of the present invention, and after pipe receives the client data, the dispatcher (dispatcher) parses the device, such as screen touch screen messages, peripheral buses, event messages, etc. dispatchers can differentiate between various data sources and do distribution operations.

Peripherals that need to transfer data through a pipe may register their own processing functions with the register_op_dispatcher_func and register_dispatcher_func functions. The former is used for communication protocols or methods requiring packetization, and the latter is a conventional method.

Fig. 4 shows the principle of pipeline communication between different machines of the present invention. By means of different IPC communication technologies, the pipe can realize two communication schemes, namely local and remote, so that the form of local simulation and remote simulation is supported. As shown in fig. 4, when supporting local communication, a and B may exist in the same process (B is a thread of a), or may be two separate processes running on a physical machine; when supporting telecommunications, a and B run on two physical machines, respectively, as separate processes.

3. Client terminal

The client side comprises: the client comprises two parts, namely a function decoder, simulation equipment, a parallel operation framework and a collaborative synchronization mechanism.

A) Function decoder: decoding is the reverse of encoding, and all packets of the OpenCL API function are subjected to interface parsing one by one.

B) Simulation device: host processor CPU and other hardware devices of the analog OpenCL device.

C) Parallel operation: the parallel model of OpenCL is simulated, including task parallelism and data parallelism, and parallel operations are implemented using multithreading or other means.

D) Synchronization mechanism: the event mechanism of the simulation OpenCL is used for synchronously controlling parallel flows and mutually exclusive client workflow, and the event mechanism is specifically as follows:

and (3) decoding, analyzing and processing the received data, and calling a display card driver of the local PC according to the decoding result to realize a driver interface corresponding to the OpenCL. If the API needs to return a value or some input event generated by the input device is captured, the result or event information of the execution of the local calling API is transmitted back to the host through the pipe.

The semi-physical OpenCL model simulation device according to the present application may be implemented by one or more computing devices to perform the semi-physical OpenCL model simulation method. It should be noted that, in practice, the computing device used to implement the data processing method of the present application may be any type of device, and the present application is not limited to the specific hardware configuration of the computing device. The computing device may preferably include a system memory and one or more processors. A memory bus may be used for communication between the processor and the system memory.

The processor may be any type of processor, depending on the desired configuration, including but not limited to: a microprocessor (μp), a microcontroller (μc), a digital information processor (DSP), or any combination thereof. A processor may include one or more levels of caches, such as a first level cache and a second level cache, a processor core, and registers. The processor cores may include Arithmetic Logic Units (ALUs), floating Point Units (FPUs), digital Signal Processing (DSP) cores, or any combination thereof. An example memory controller may be used with the processor or, in some implementations, the memory controller may be an internal part of the processor.

Depending on the desired configuration, the system memory may be any type of memory, including, but not limited to: volatile memory (such as RAM), non-volatile memory (such as ROM, flash memory, etc.), or any combination thereof. Physical memory in a computing device is often referred to as volatile memory, RAM, and data in disk needs to be loaded into physical memory in order to be read by a processor. The system memory may include an operating system, one or more applications, and program data. In some implementations, the application may be arranged to execute instructions on the operating system using program data by one or more processors. The operating system may be Linux, windows or the like, for example, that includes program instructions for handling basic system services and performing hardware-dependent tasks. Applications include program instructions for implementing various user-desired functions, and the applications may be, for example, but not limited to, browsers, instant messaging software, software development tools (e.g., integrated development environments IDE, compilers, etc.), and the like. When an application is installed into a computing device, a driver module may be added to the operating system.

When the computing device is started up, the processor reads the program instructions of the operating system from the system memory and executes the program instructions. Applications run on top of the operating system, utilizing the interfaces provided by the operating system and underlying hardware to implement various user-desired functions. When a user starts an application, the application is loaded into system memory, from which the processor reads and executes the program instructions of the application.

The computing device also includes a storage device that includes removable storage (e.g., CD, DVD, U disk, removable hard disk, etc.) and non-removable storage (e.g., hard disk drive HDD, etc.), both of which are connected to the storage interface bus.

The computing device may also include a storage interface bus. The storage interface bus enables communication from storage devices (e.g., removable storage and non-removable storage) to the basic configuration via the bus/interface controller. At least a portion of the operating system, applications, and program data may be stored on removable storage and/or non-removable storage and loaded into system memory via a storage interface bus and executed by one or more processors when the computing device is powered up or the application is to be executed.

The computing device may also include an interface bus that facilitates communication from various interface devices (e.g., output devices, peripheral interfaces, and communication devices) to the basic configuration via the bus/interface controller. An example output device includes an image processing unit and an audio processing unit. They may be configured to facilitate communication with various external devices, such as a display or speakers, via one or more a/V ports. Example peripheral interfaces may include a serial interface controller and a parallel interface controller, which may be configured to facilitate communication with external devices such as input devices (e.g., keyboard, mouse, pen, voice input device, touch input device) or other peripherals (e.g., printer, scanner, etc.) via one or more I/O ports. An example communication device may include a network controller that may be arranged to facilitate communication with one or more other computing devices via one or more communication ports over a network communication link.

The computing device may be implemented as a personal computer including desktop and notebook computer configurations. Of course, the computing device may also be implemented as part of a small-sized portable (or mobile) electronic device, such as a cellular telephone, digital camera, personal Digital Assistant (PDA), personal media player device, wireless web-watch device, personal headset device, application specific device, or hybrid device that may include any of the above functions. And may even be implemented as servers, such as file servers, database servers, application servers, WEB servers, and the like. The embodiments of the present application are not limited in this regard.

In an embodiment according to the application, the computing device is configured to perform a semi-physical OpenCL model emulation method according to the application. Wherein the application disposed on the operating system contains a plurality of program instructions for executing the method, which program instructions can instruct the processor to perform the method of the present application.

According to the semi-physical OpenCL model simulation device and the computing equipment, on the basis of semi-physical OpenCL simulation, the purpose of compatible OpenCL function simulation of different embedded systems is achieved, the problem of complex equipment attributes of the GPU can be avoided by adopting a multi-port communication mode, and virtual simulation of the OpenCL logic function of the low-burden GPU based on a virtual simulation platform can be achieved. Compared with the prior art, the method has low-energy compatibility with the software layer/application layer for running any graphic application, is easy to expand, and can quickly realize simulation of various GPU (graphic processing unit) equipment without generating additional work expenditure.

Compared with the prior art, the invention has the following advantages:

The advantages are as follows: the performance is strong.

The advantages are as follows: the universality is good.

Fig. 1 shows a semi-physical OpenCL model simulation method of the present application, and it should be noted that, the semi-physical OpenCL model simulation method may be executed in a semi-physical OpenCL model simulation device, and descriptions about the semi-physical OpenCL model simulation device and descriptions about the semi-physical OpenCL model simulation method are complementary, and are not repeated. The semi-physical OpenCL model simulation method is realized by combining hardware and software. The semi-physical OpenCL model simulation method is suitable for any intelligent host or processor with a network communication function.

The invention provides a semi-physical OpenCL model simulation method, which comprises the following steps: and respectively establishing simulation abstraction layers of the target machine, the pipeline and the client, and adopting a static/dynamic coding technology to cooperate with the driving abstraction layers. The abstraction layer is based on modular construction and is loaded by the virtual simulation platform in the form of a dynamic library.

And creating a memory model at a simulation abstract layer at the side of the target machine and loading the memory model through a corresponding process. And creating corresponding processes and threads at a simulation abstract layer at one side of the pipeline to monitor the data updated by the memory model, analyzing the equipment by a scheduler to distinguish various data sources and perform distribution operation, and distributing and scheduling simulation parameters and results after process interaction of inter-process communication to one side of the target machine by distributing a cache area and threads.

More specifically, the invention adopts a semi-physical simulation form, is realized in a mode of intercepting an API of OpenCL and translating the API to a host machine to run, and essentially mobilizes the GPU capability of the host machine. The simulated GPU module consists of three parts: host, pipe, client. host is implemented in the target machine program, and has the main function of replacing the OpenCL library, encoding the API and parameters, and writing the API and parameters into the contracted memory space. The pipe captures encoded data and sends it to the client via TCP/UDP or other IPC. The client runs in the virtual simulation platform, and after the client receives the data, decoding translation execution is realized.

Inter-process communication (IPC, interprocess communication) is a set of programming interfaces that allow programmers to coordinate different processes to run simultaneously in an operating system and to communicate and exchange information with each other. This enables one program to handle the requirements of many users at the same time. Because even a single user may be requesting, multiple processes in an operating system may run, and the processes must communicate with each other. The IPC interface offers this possibility. Each IPC method has its own advantages and limitations, and in general, it is not uncommon for a single program to use all IPC methods.

IPC methods include PIPE (PIPE), message queuing, semaphore, shared memory, and Socket.

The conduit may include three types:

1) Common pipeline: there are generally two limitations, one is simplex, and only one way transmission is possible. And secondly, the method can only be used among parent-child or brother processes.

2) Flow conduit: the first limitation is removed, and the transmission can be performed in two directions by half duplex.

3) Named pipes: the second limitation is removed and communication can be made between many unrelated processes.

Inter-process communication may include the following functions:

1) And (3) data transmission: one process needs to send its data to another process, with the amount of data sent being between one byte and a few megabytes.

2) Sharing data: multiple processes want to manipulate the shared data, one process modifying the shared data, and the other should see it at once.

3) Notification event: one process needs to send a message to another process or group of processes informing it(s) that something has happened (e.g., a parent process is to be informed when a process is terminated).

4) Resource sharing: the same resource is shared among multiple processes. To do this, the kernel is required to provide a lock and synchronization mechanism.

5) And (3) process control: some processes want to fully control the execution of another process (e.g., a Debug process), at which point the control process wants to be able to intercept all traps and exceptions of another process and to know its state change in time.

Processes coordinate their behavior by communicating with the kernel and other processes. Linux supports a variety of inter-process communication (IPC) mechanisms, two of which are signal and pipe. In addition, linux supports the IPC mechanism of System V (named with the first-emerging Unix version).

According to an embodiment of the present application, allocating buffers and threads to distribute and schedule simulation parameters and results after process interactions via inter-process communication includes: and the A module encodes the data group package and transmits the data to the B module through an IPC protocol. And B, receiving corresponding data and decoding the data through the IPC protocol, and decoding, analyzing and processing the received data. And outputting the simulation operation result to the corresponding display peripheral.

According to an embodiment of the present application, performing decoding analysis processing on received data and outputting a simulation operation result to a corresponding display peripheral device includes: and carrying out interface analysis on the data packets of all the OpenCL API functions one by one. And according to the decoding result, calling the display card driver of the local machine to realize the driver interface corresponding to the OpenCL. If the API needs to return a value or input events generated by some input devices are captured, the result or event information of the execution of the local calling API is transmitted back to the target machine through a pipeline. Host processor CPU and other hardware devices of the OpenCL device are emulated. Parallel models simulating OpenCL, including task parallelism and data parallelism, use multithreading or other means to implement parallel operations. The event mechanism of the OpenCL is simulated and used for synchronously controlling and mutually exclusive parallel flows.

According to the embodiment of the application, the implementation of the driving interface corresponding to the OpenCL comprises the analysis and the processing of a memory model, a kernel model, a thread model, a command queue and a synchronous mechanism model. The processing kernel model comprises the steps of analyzing an OpenCL kernel function, creating and destroying an OpenCL object and processing an OpenCL event. The processing of the memory model comprises the operations of allocating and releasing the memory, performing read-write operation on the memory, mapping and canceling the memory object, creating and releasing the memory buffer area object and performing operation on the local memory of the working group. The processing thread model includes processing work items, work groups, and kernel thread assignments. Processing the command queue includes parsing the command queue in the OpenCL application, simulating the execution of the instruction. The process synchronization mechanism model includes process event synchronization, and synchronization of process work items and work groups.

According to the embodiment of the application, the A module and the B module exist in the same process, and the B module is a thread of the A module. Or the A module and the B module are used as two independent processes, and the A module and the B module run on a physical machine in parallel. Or the A module and the B module respectively run on two physical machines supporting remote communication, and the A module is operated as a separate process to encode the data packet and send the data to the B module through an IPC protocol.

According to an embodiment of the present application, the result or event information of calling the API execution includes: and if the parameter is of a numerical value type, directly writing the interface code and the parameter into the equipment memory. And if the parameter is the type of the transfer array data, directly writing the interface code and the parameter into the equipment memory, additionally recording the length of the array data, and then writing the array data completely. When there is a return value, the interface code and parameters are directly written into the device memory and the loop of read back is called to wait for the return value of the host side.

According to an embodiment of the present application, performing decoding analysis processing on data includes: and decoding the functions, and carrying out interface analysis on the data packets of all the OpenCL API functions one by one. And the simulation device simulates a host processor CPU and other hardware devices of the OpenCL device. And (5) a parallel operation framework. And processing a collaborative synchronization mechanism, simulating an event mechanism of OpenCL, and performing synchronous control and mutual exclusion on the parallel flow.

The implementation means of the semi-physical OpenCL model simulation method specifically comprises the following five aspects:

The invention adopts an interface coding method based on static/dynamic coding technology and a driving abstract layer.

And the static/dynamic coding technology is used to cooperate with a driving abstract layer, so that the support of multiple platforms and multiple architectures is realized. Static coding technology participates in the compiling process of the target program, and can shield instruction set differences of different architecture processors. The dynamic coding technology does not need to modify the source code of the target program, can reduce the workload of development and debugging, has complementary advantages with the static coding technology, and meets different application scenes. The driver abstraction layer may support a variety of different operating systems, masking operating system differences. Based on the technical combination, the host can reduce the workload of adapting to various complex service scenes in the implementation process, greatly improve the development efficiency and improve the deployment speed.

And secondly, the invention adopts an easily-expanded interface coding packet protocol.

The coded packet protocol adopts API data stream type service, light description format, and the data packet can be directly accessed by the pipe, so that the data packet is convenient to forward. The protocol is focused on the description of the data and has one dimension compared with the description of the interface, so that for different interfaces, the protocol only focuses on the grouping of the data, and the expansion of the data of different interfaces can be conveniently carried out.

And thirdly, the pipeline communication technology based on the IPC mechanism.

After the pipe acquires the coded data, the multi-process/multi-thread/multi-machine communication can be conveniently realized through a transmission mechanism of TCP/UDP, and the service floor scene is greatly enriched.

Fourth, the present invention supports a scheduler method for multi-device module data distribution.

Besides supporting the processing of coded data, the pipe can also be used for processing some accessory devices in some GPU application scenes, such as scenes in which the GPU and the touch screen realize picture display and touch screen interaction, wherein touch signals of the touch screen device can also be sent to the pipe and distributed through a dispatcher of the pipe, so that the complexity of a device model is simplified.

And fifthly, the invention is based on modularized construction, flexible and easy to expand.

The whole GPU simulation equipment is based on modular construction and is loaded by a virtual simulation platform in the form of a dynamic library. The coupling with the platform is low, and the model can be loaded according to actual needs, so that the GPU simulation function is realized.

Certain aspects or portions of the methods and apparatus of the present application may take the form of program code (i.e., instructions) embodied in tangible media, such as removable hard drives, U-disk, floppy diskettes, CD-ROMs, or any other machine-readable storage medium, wherein, when the program is loaded into and executed by a machine, such as a computer, the machine becomes an apparatus for practicing the application.

In the case of program code execution on programmable computers, the computing device will generally include a processor, a storage medium readable by the processor (including volatile and non-volatile memory and/or storage elements), at least one input device, and at least one output device. Wherein the memory is configured to store program code; the processor is configured to execute the non-vector data processing method of the present application in accordance with instructions in said program code stored in the memory.

By way of example, and not limitation, readable media comprise readable storage media and communication media. The readable storage medium stores information such as computer readable instructions, data structures, program modules, or other data. Communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. Combinations of any of the above are also included within the scope of readable media.

In the description provided herein, algorithms and displays are not inherently related to any particular computer, virtual system, or other apparatus. Various general-purpose systems may also be used with examples of the application. The required structure for a construction of such a system is apparent from the description above. In addition, the present application is not directed to any particular programming language. It should be appreciated that the teachings of the present application as described herein may be implemented in a variety of programming languages and that the foregoing descriptions of specific languages are provided for disclosure of preferred embodiments of the present application.

In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the application may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the application, various features of the application are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the method of this application should not be interpreted as reflecting the intent: i.e., the claimed application requires more features than are expressly recited in each claim. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this application.

Those skilled in the art will appreciate that the modules or units or components of the apparatus in the examples disclosed herein may be arranged in an apparatus as described in this embodiment, or alternatively may be located in one or more devices different from the apparatus in this example. The modules in the foregoing examples may be combined into one module or may be further divided into a plurality of sub-modules.

Those skilled in the art will appreciate that the modules in the apparatus of an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component and, furthermore, they may be divided into a plurality of sub-modules or sub-units or sub-components. Any combination of all features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or units of any method or apparatus so disclosed, may be used in combination, except insofar as at least some of such features and/or processes or units are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings), may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features but not others included in other embodiments, combinations of features of different embodiments are meant to be within the scope of the application and form different embodiments. For example, in the following claims, any of the claimed embodiments can be used in any combination.

Furthermore, some of the embodiments are described herein as methods or combinations of method elements that may be implemented by a processor of a computer system or by other means of performing the functions. Thus, a processor with the necessary instructions for implementing the described method or method element forms a means for implementing the method or method element. Furthermore, the elements of the apparatus embodiments described herein are examples of the following apparatus: the apparatus is for performing functions performed by elements for purposes of this disclosure.

According to the semi-physical OpenCL model simulation method provided by the invention, on the basis of semi-physical OpenCL simulation, the purpose of compatible OpenCL function simulation of different embedded systems is achieved, and the problem of complex equipment attributes of the GPU can be avoided by adopting a multi-port communication mode, and virtual simulation of the OpenCL logic function of the low-burden GPU based on a virtual simulation platform can be realized. Compared with the prior art, the method has low-energy compatibility with the software layer/application layer for running any graphic application, is easy to expand, and can quickly realize simulation of various GPU (graphic processing unit) equipment without generating additional work expenditure.

Compared with the prior art, the invention has the following advantages:

The advantages are as follows: the performance is strong.

The advantages are as follows: the universality is good.

The above description has been made of the embodiments of the semi-physical OpenCL model simulation method, device and computing device, and is for the purpose of explaining the spirit of the present invention. Note that modifications and combinations of the features of the above-described embodiments can be made by those skilled in the art without departing from the spirit of the present invention, and therefore, the present invention is not limited to the above-described embodiments. The specific features, such as shape, size and position, of the semi-physical OpenCL model simulation device and the computing device of the present invention can be specifically designed by the effects of the above disclosed features, and these designs can be implemented by those skilled in the art. Moreover, each feature disclosed above is not limited to the combination of the features disclosed with other features, and other combinations of features may be made by those skilled in the art in accordance with the purpose of the present invention, so as to achieve the purpose of the present invention.

Claims

1. The semi-physical OpenCL model simulation method is characterized by comprising the following steps of:

respectively establishing simulation abstraction layers of a target machine, a pipeline and a client, and driving the abstraction layers by adopting a static/dynamic coding technology in a matching way; the abstract layer is constructed based on modularization and is loaded by the virtual simulation platform in the form of a dynamic library;

Creating a memory model at a simulation abstract layer at one side of the target machine and loading the memory model through a corresponding process;

Creating corresponding processes and threads at a simulation abstract layer at one side of the pipeline to monitor data updated by a memory model, analyzing equipment by a scheduler to distinguish various data sources and perform distribution operation, and distributing and scheduling simulation parameters and results after process interaction of inter-process communication to one side of the target machine by distributing a cache area and threads; wherein,

The process interaction of the inter-process communication includes: the pipeline accesses and forwards the data packet created by the coding packet protocol, wherein the coding packet protocol adopts API data flow and lightweight description to code the data packet; wherein the lightweight description performs packet encoding including encoding of the packet into an interface, a total length, and a format of the parameters transferred to the interface;

After the pipeline acquires the data pack coded data, the pipeline realizes multi-process/multi-thread/multi-machine communication through a transmission mechanism of TCP/UDP; distributing the buffer area and the threads to distribute data comprises the steps of sending contact signals of touch screen equipment to the pipeline, and distributing the contact signals through a dispatcher of the pipeline to realize the scene of picture display and touch screen interaction;

creating a process compatible with display card driving operation and capable of communicating with a pipeline at a simulation abstract layer at one side of the client, receiving data of the process communicating between the processes by the model to perform synchronous parallel simulation operation, and returning corresponding simulation parameters and results to one side of the pipeline after the simulation operation;

and the display card drives to load the simulation operation result at one side of the client and outputs the simulation operation result to the corresponding display peripheral.

2. The method of claim 1, wherein allocating buffers and threads to distribute and schedule simulation parameters and results after process interactions via inter-process communication comprises:

The A module encodes the data group package and sends the data to the B module through an IPC protocol;

the B module receives corresponding data through an IPC protocol, decodes the data, and decodes, analyzes and processes the received data; and outputting the simulation operation result to the corresponding display peripheral.

3. The method of claim 2, wherein performing decoding analysis processing on the received data and outputting simulation operation results to the corresponding display peripheral devices comprises:

interface analysis is carried out on all the data packets of the OpenCL API function one by one; and according to the decoding result, calling a display card driver of the local machine to realize a driver interface corresponding to the OpenCL;

If the API needs to return a value or input events generated by some input devices are captured, returning the result or event information executed by the local calling API to the target machine through a pipeline;

A host processor CPU and other hardware devices of the OpenCL device are simulated;

the parallel model for simulating OpenCL comprises task parallelism and data parallelism, and parallel operation is realized by using multithreading or other modes;

the event mechanism of the OpenCL is simulated and used for synchronously controlling and mutually exclusive parallel flows.

4. The method of claim 3, wherein implementing the drive interface for the corresponding OpenCL includes analyzing and processing a memory model, a kernel model, a thread model, a command queue, and a synchronization mechanism model;

the processing kernel model comprises the steps of analyzing an OpenCL kernel function, creating and destroying an OpenCL object and processing an OpenCL event;

processing the memory model comprises the operations of distributing and releasing the memory, performing read-write operation on the memory, mapping and canceling the memory object, creating and releasing the memory buffer area object and performing local memory operation on the working group;

The processing thread model comprises the steps of processing work items, work groups and kernel thread allocation;

Processing the command queue comprises analyzing the command queue in the OpenCL application program and simulating the execution process of the instruction;

the process synchronization mechanism model includes process event synchronization, and synchronization of process work items and work groups.

5. The method of claim 2, wherein,

The A module and the B module exist in the same process, and the B module is a thread of the A module; or the A module and the B module are used as two independent processes, and the A module and the B module run on a physical machine in parallel; or the A module and the B module respectively run on two physical machines supporting remote communication, and the A module is operated as a separate process to encode the data packet and send the data to the B module through an IPC protocol.

6. The method of claim 3, wherein calling the result or event information of the API execution comprises:

if the parameter is of a numerical value type, directly writing the interface code and the parameter into the equipment memory;

If the parameter is the type of the transfer array data, the interface code and the parameter are directly written into the equipment memory, the length of the array data is additionally recorded, and then the array data are completely written;

When there is a return value, the interface code and parameters are directly written into the device memory and the loop of read back is called to wait for the return value of the host side.

7. The method of claim 2, wherein performing a decoding analysis process on the data comprises:

Function decoding, namely carrying out interface analysis on the data packets of all the OpenCL API functions one by one;

a simulation device simulating a host processor CPU and other hardware devices of the OpenCL device; a parallel operation framework;

and processing a collaborative synchronization mechanism, simulating an event mechanism of OpenCL, and performing synchronous control and mutual exclusion on the parallel flow.

8. The semi-physical OpenCL model simulation device is characterized by comprising a target machine, a pipeline and a client:

Respectively establishing simulation abstract layers of a target machine, a pipeline and a client on corresponding equipment, and driving the abstract layers in a matched manner by adopting a static/dynamic coding technology; the abstract layer is constructed based on modularization and is loaded by the virtual simulation platform in the form of a dynamic library;

9. A computing device, comprising:

one or more processors;

A memory;

one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs comprising instructions for performing the method of any of claims 1-7.

10. A computer readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by a computing device, cause the computing device to perform the method of any of claims 1-7.