CN116228515B - Hardware acceleration system, method and related device - Google Patents

Hardware acceleration system, method and related device Download PDF

Info

Publication number
CN116228515B
CN116228515B CN202310502779.6A CN202310502779A CN116228515B CN 116228515 B CN116228515 B CN 116228515B CN 202310502779 A CN202310502779 A CN 202310502779A CN 116228515 B CN116228515 B CN 116228515B
Authority
CN
China
Prior art keywords
executable program
module
program
accelerated
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310502779.6A
Other languages
Chinese (zh)
Other versions
CN116228515A (en
Inventor
郑瀚寻
马学韬
杨龚轶凡
闯小明
陆涵枭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Yangsiping Semiconductor Co ltd
Original Assignee
Suzhou Yangsiping Semiconductor Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Yangsiping Semiconductor Co ltd filed Critical Suzhou Yangsiping Semiconductor Co ltd
Priority to CN202310502779.6A priority Critical patent/CN116228515B/en
Publication of CN116228515A publication Critical patent/CN116228515A/en
Application granted granted Critical
Publication of CN116228515B publication Critical patent/CN116228515B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/163Interprocessor communication
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7867Architectures of general purpose stored program computers comprising a single central processing unit with reconfigurable architecture
    • G06F15/7871Reconfiguration support, e.g. configuration loading, configuration switching, or hardware OS
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The embodiment of the application provides a hardware acceleration system, a hardware acceleration method and a related device. The method comprises the following steps: the processor acquires a first executable program; identifying a module to be accelerated in the first executable program; the to-be-accelerated module is used for realizing a specific calculation function in a first executable program, and a plurality of to-be-accelerated modules in the first executable program are sequentially executed; reconstructing the module to be accelerated into a target module to obtain a second executable program; a plurality of target modules in the second executable program are executed concurrently to realize hardware acceleration of the object positioning program; and sending the second executable program to the FPGA so that the FPGA executes the second executable program to realize the hardware acceleration of the object positioning system. The method and the device can realize hardware acceleration of specific computing functions in the executable program by updating the modules to be accelerated, which are sequentially executed in the executable program, into the target modules which are concurrently executed, and construct a pipeline mode on the hardware circuit structure level, thereby improving the data processing efficiency.

Description

Hardware acceleration system, method and related device
Technical Field
The embodiment of the application relates to the technical field of data processing, in particular to a hardware acceleration system, a hardware acceleration method and a related device.
Background
At present, most of computing functions are realized by adopting a linear operation mode in an object positioning program.
Specifically, the linear operation is generally performed according to steps in a program in sequence, and each step is performed to mobilize hardware resources to implement a corresponding data processing flow. The operation mode has the advantages of simpler control logic and hardware resource saving. However, the efficiency of the linear operation is low, so that more data processing resources are required to be consumed, and the linear operation cannot be adapted to a scene with larger data throughput.
Therefore, a new solution is needed to optimize the linear operation in the object positioning procedure, improve the data processing efficiency, and cope with the scenario with larger data processing amount.
Disclosure of Invention
The embodiment of the application provides an improved hardware acceleration system, an improved hardware acceleration method and an improved hardware acceleration related device, which are used for realizing hardware acceleration of an object positioning program, constructing a pipeline mode on the hardware circuit structure level, improving the data processing efficiency and improving the program execution speed.
Embodiments of the present application desirably provide a hardware acceleration system, method, and related apparatus.
In a first aspect of the present application, there is provided a hardware acceleration system comprising a processor and a field programmable gate array FPGA: wherein the method comprises the steps of
The processor is configured to acquire a first executable program, and the first executable program is used for realizing object positioning; identifying a module to be accelerated in the first executable program, wherein the module to be accelerated is used for realizing a specific computing function in the first executable program, and a plurality of modules to be accelerated in the first executable program are executed in sequence; reconstructing the module to be accelerated into a target module to obtain a second executable program, and sending the second executable program to the FPGA; wherein a plurality of the target modules in the second executable program are executed concurrently;
the FPGA configured to receive the second executable program; the second executable program is executed to achieve hardware acceleration of the object positioning program.
In a second aspect of the present application, there is provided a processor for use in an object positioning system, the processor comprising:
the receiving and transmitting module is configured to acquire a first executable program, and the first executable program is used for realizing object positioning;
a processing module configured to identify a module to be accelerated in the first executable program; the to-be-accelerated module is used for realizing a specific computing function in the first executable program, and a plurality of to-be-accelerated modules in the first executable program are sequentially executed; reconstructing the module to be accelerated into a target module to obtain a second executable program, and sending the second executable program to an FPGA; and the target modules in the second executable program are executed concurrently to realize hardware acceleration of the object positioning program in the FPGA.
In a third aspect of the present application, there is provided an FPGA for use in an object positioning system, the FPGA comprising:
a transceiver module configured to receive a second executable program from the processor; the target modules are obtained by reconstruction based on a module to be accelerated after the processor identifies the module to be accelerated in a first executable program, and the first executable program is used for realizing object positioning;
and a processing module configured to execute the second executable program to achieve hardware acceleration of the object positioning program.
In a fourth aspect of the present application, there is provided a hardware acceleration method of an object positioning system, for use with a processor in the object positioning system, the method comprising:
acquiring a first executable program, wherein the first executable program is used for realizing object positioning;
identifying a module to be accelerated in the first executable program; the to-be-accelerated module is used for realizing a specific computing function in the first executable program, and a plurality of to-be-accelerated modules in the first executable program are sequentially executed;
Reconstructing the module to be accelerated into a target module to obtain a second executable program; wherein a plurality of the target modules in the second executable program are executed concurrently;
and sending the second executable program to an FPGA so that the FPGA executes the second executable program to realize hardware acceleration of the object positioning system.
In a fifth aspect of the present application, there is provided a hardware acceleration method of an object positioning system, applied to an FPGA in the object positioning system, the method comprising:
receiving a second executable program from the processor; the target modules are obtained by reconstruction based on a module to be accelerated after the processor identifies the module to be accelerated in a first executable program, and the first executable program is used for realizing object positioning;
the second executable program is executed to achieve hardware acceleration of the object positioning system.
In a sixth aspect of the application, there is provided a computer readable storage medium comprising instructions which, when run on a computer, cause the computer to perform the hardware acceleration method described in the fourth aspect.
In a seventh aspect of the application, there is provided a computing device configured to: a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the hardware acceleration method described in the fourth aspect when the computer program is executed by the processor.
In the technical solution provided in the embodiment of the present application, the processor may acquire a first executable program, for example, where the first executable program is used to implement object positioning. Further, the processor identifies a module to be accelerated in the first executable program to determine a portion of the first executable program that requires acceleration optimization. Here, the module to be accelerated is used for realizing the specific computing function in the first executable program, and the execution efficiency of the specific computing function is accelerated by optimizing the part, so that the effect of overall acceleration of the program is achieved. Finally, the processor reconstructs the module to be accelerated into a target module to obtain a second executable program, and sends the second executable program to the FPGA so that the FPGA executes the second executable program, and therefore hardware acceleration of the object positioning program is realized through a plurality of target modules which can be executed concurrently in the second executable program. In the embodiment of the application, the module for executing the specific calculation function in the executable program is updated to the concurrently executed target module from the sequentially executed module for acceleration by identifying the module for acceleration with the specific calculation function in the executable program, so that the hardware acceleration of the specific calculation function in the executable program is realized, and the pipeline mode is constructed on the hardware circuit structure level, thereby improving the execution mode of the executable program, greatly improving the data processing efficiency of the executable program and improving the program execution speed.
Drawings
The above, as well as additional purposes, features, and advantages of exemplary embodiments of the present application will become readily apparent from the following detailed description when read in conjunction with the accompanying drawings. Several embodiments of the present application are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which:
FIG. 1 schematically illustrates a flow diagram of a hardware acceleration method according to the present application;
FIG. 2 schematically illustrates a flow diagram of a second executable program acquisition process in accordance with the present application;
FIG. 3 schematically illustrates a flow diagram of a second program information acquisition process according to the present application;
FIG. 4 schematically illustrates a flow diagram of a pipeline optimization process in accordance with the present application;
FIG. 5 schematically illustrates a schematic diagram of a pipeline optimization process in accordance with the present application;
FIG. 6 schematically illustrates another schematic diagram of a pipeline optimization process in accordance with the present application;
FIG. 7 schematically illustrates a flow diagram of a device binary acquisition process in accordance with the present application;
FIG. 8 schematically illustrates another flow diagram of a device binary acquisition process in accordance with the present application;
FIG. 9 schematically illustrates a flow diagram of a hardware acceleration system according to the present application;
FIG. 10 schematically illustrates a schematic diagram of a hardware acceleration device according to the present application;
FIG. 11 schematically illustrates a structural diagram of a computing device in accordance with the present application;
fig. 12 schematically shows a schematic structural diagram of a server according to the present application.
In the drawings, the same or corresponding reference numerals indicate the same or corresponding parts.
Detailed Description
The principles and spirit of the present application will be described below with reference to several exemplary embodiments. It should be understood that these examples are given solely to enable those skilled in the art to better understand and practice the present application and are not intended to limit the scope of the present application in any way. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
Those skilled in the art will appreciate that embodiments of the application may be implemented as a system, apparatus, device, system, or computer program product. Accordingly, the present disclosure may be embodied in the following forms, namely: complete hardware, complete software (including firmware, resident software, micro-code, etc.), or a combination of hardware and software.
At present, most of computing functions are realized by adopting a linear operation mode in an object positioning program. Specifically, in the linear operation method, each step is executed to mobilize hardware resources according to the execution order of the steps defined in the program, and a corresponding data processing flow is realized. The operation mode has the advantages of simpler control logic and hardware resource saving. However, the efficiency of the linear operation is low, so that more data processing resources are required to be consumed, and the linear operation cannot be adapted to a scene with larger data throughput.
Therefore, there is a need to design a new solution for optimizing linear operations in a program (such as an object positioning program), improving data processing efficiency, and coping with a scenario with a large data processing amount.
In order to overcome the technical problems, according to the embodiments of the present application, a hardware acceleration system, a method and a related device are provided.
The applicant finds that in the related art, the executable program needs to be input into the existing software to perform translation integration, so as to obtain a code file executable by the device, and the device is convenient to realize the related program based on the code file. However, this conversion method is usually mechanical translation, which has a large limitation, and a large amount of redundancy is easily generated in the code file, so that the device performs a large amount of invalid operations, occupies device processing resources, and has low processing efficiency.
Compared with the scheme of adopting the existing software direct conversion processing in the related art, the technical scheme of the embodiment of the application has the advantages that the modules to be accelerated with specific computing functions in the executable program are identified, the executable program is optimized in a targeted acceleration way, the modules for executing the specific computing functions in the executable program are updated from the modules to be accelerated which are executed in sequence to the target modules which are executed concurrently, the hardware acceleration of the specific computing functions in the executable program is realized, and the data processing efficiency is greatly improved. The technical scheme of the embodiment of the application can construct a concurrent execution pipeline mode on the hardware circuit structure level, thereby improving the execution mode of the program, avoiding the problem that the linear operation flow and the redundant operation caused by direct conversion occupy processing resources in the prior art, greatly improving the data processing efficiency of the executable program and improving the execution speed of the program.
As an alternative embodiment, the number of hardware acceleration devices may be one or more. The hardware acceleration device may be implemented as a logic unit disposed inside a chip, or may be disposed in other forms in a digital circuit structure, which is not limited by the present application. For example, the hardware acceleration device may be provided in a processing means of various devices (e.g., terminal device, server).
Any number of elements in the figures are for illustration and not limitation, and any naming is used for distinction only, and not for any limiting sense.
A method for hardware acceleration according to an exemplary embodiment of the present application is described below with reference to fig. 1 in conjunction with a specific application scenario. It should be noted that the above application scenario is only shown for the convenience of understanding the spirit and principle of the present application, and the embodiments of the present application are not limited in any way. Rather, embodiments of the application may be applied to any scenario where applicable.
The execution process of the hardware acceleration method is described below in connection with the following embodiments. Fig. 1 is a flowchart of a hardware acceleration method of an object positioning system according to an embodiment of the present application. The method is applied to a processor in an object positioning system. As shown in fig. 1, the method comprises the steps of:
step 101, a first executable program is acquired.
Where executable procedures refer to certain and limited steps for solving a particular problem. For example, it may be an object positioning program for determining the position of an object. Alternatively, an edge detection program for solving the problem of image edge recognition may be used. Alternatively, it may be an identification program for realizing the flaw detection of the product. The present application is applicable to various program acceleration scenarios, and the program categories listed herein are only examples, and the present application is not limited thereto.
In the embodiment of the application, an implementation mode and related parameter variables are recorded in an executable program. The functions to be performed by the steps of the executable program may be described in a software programming language. A software programming language, also known as a high-level language, is a computer programming language used to express programs. Briefly, a software programming language is a tool used to describe programs, and is a common language between machine and mathematical languages. In practice, common software programming languages include, but are not limited to: C. visual Basic, C++, java.
For convenience of description, an executable program to be processed will be referred to as a first executable program in the embodiment of the present application. The program to be implemented by the first executable program may be an object positioning program, an image recognition program, or other programs, and the present application is not limited thereto. One or more programs may be included in the first executable program.
After describing the first executable program acquired by the embodiment of the present application, a process flow of the first executable program is described below with reference to a specific example.
In the related art, executable programs need to be input into existing software to be translated and integrated, so that code files executable by equipment are obtained, and the equipment can conveniently realize the related programs based on the code files. However, this conversion method is usually mechanical translation, which has a large limitation, and a large amount of redundancy is easily generated in the code file, so that the device performs a large amount of invalid operations, occupies device processing resources, and has low processing efficiency. In particular, the dynamic memory allocation and the recursive function in the program are easy to generate a great deal of redundancy when the conversion is performed in the mode.
Compared with the related art, the method and the device for identifying the executable program have the advantages that the parts for realizing the specific computing function in the executable program are identified more pertinently, and the parts are processed pertinently, so that the problems of code file redundancy, low equipment processing efficiency and the like caused by mechanical translation in the related art are avoided, and the hardware acceleration of the whole executable program is realized. The specific implementation process is as follows from step 102 to step 103.
Step 102, identify a module to be accelerated in the first executable program.
And 103, reconstructing the module to be accelerated into a target module to obtain a second executable program.
In the embodiment of the present application, the module to be accelerated is configured to implement a specific computing function in the first executable program. It will be appreciated that the specific computing functionality varies depending on the type of program. For example, in an object positioning program, the specific calculation function may be, for example, at least one of the following functions: and a data processing function (such as point cloud data conversion, matrix calculation and the like) for the three-dimensional point cloud, a calculation function for the object position based on the point cloud data, a correction function for the object position, and a loss value optimization function for the object positioning model. Further, the method can be further refined into dynamic memory allocation and recursion functions used in the object positioning program. For example, in an image edge detection program, the specific calculation function may be, for example, image segmentation, edge recognition, or the like. For example, in an image processing program, the specific computing function may be, for example, key frame extraction, color conversion, image compression, or the like. In practical application, a specific calculation function can be selected according to the optimization requirement, or the specific calculation function required to be optimized can be dynamically adjusted according to the execution condition of the equipment, so that the execution efficiency of the equipment is improved in an auxiliary manner, and the acceleration effect of hardware is improved.
And executing a plurality of modules to be accelerated in the first executable program in sequence. In short, the first executable program generally includes a plurality of steps, and the software programming language generally defines an execution sequence of the plurality of steps when describing the plurality of steps, for example, directly defines the execution sequence of the plurality of steps, or defines an execution condition of the steps, so that the plurality of steps need to be executed in a specific order.
As an optional embodiment, in step 102, according to a program type related to the first executable program, an object code matching the program type may be detected in the first executable program as a module to be accelerated. Further alternatively, key information matched with the key information may be preset for different program types, so that the key information is used for detecting the target code in the first executable program, for example, the key information may be a function for implementing a specific computing function, a variable object with the specific computing function, or various package information related to the target code.
In another optional embodiment, in step 102, the association relationship of each code module in the first executable program is analyzed according to a preset software programming language rule, and the code module to be optimized is screened out as the module to be accelerated according to the association relationship of each code module in the first executable program and an optimization rule. Here, the optimization rule may be dynamically adjusted based on the program execution condition of the device, for example, when it is detected that the execution efficiency of a certain link is low, the timing sequence of the code module corresponding to the link and the associated code module is adjusted and optimized, so as to avoid the execution process from being stagnated in the link.
In addition to the several implementations described above, step 102 may be implemented in other ways, which are not listed here. It should be noted that, regardless of the implementation manner, the purpose of the step 102 is to screen out the portion to be optimized from the first executable program, so as to provide a more targeted optimization object for the subsequent step, avoid redundancy in the conversion process, and further improve the hardware acceleration effect on the first executable program. Of course, in some embodiments, the whole first executable program may also be directly converted, in which case, the whole first executable program may be split into a plurality of modules to be accelerated, where the splitting range may be set according to a specific computing function.
After identifying the module to be accelerated, in step 103, optimization processing needs to be performed on the module to be accelerated, so that the module to be accelerated can call hardware resources more efficiently, and hardware acceleration of the whole first executable program is achieved. Specifically, in step 103, the module to be accelerated is reconstructed as the target module, so as to obtain a second executable program.
In the embodiment of the application, a plurality of target modules in the second executable program are executed concurrently to realize hardware acceleration of the object positioning system. Because the reconstructed target modules can execute the scheduling of the hardware resources concurrently, compared with the modules to be accelerated which are sequentially executed in the first executable program, the target modules in the second executable program can realize the corresponding steps by synchronously calling the hardware resources needed by the target modules to realize the concurrent implementation of the corresponding steps, and the program execution efficiency can be further improved. In addition, the targeted optimization process from the module to be accelerated to the target module can also avoid the problems of code file redundancy, low equipment processing efficiency and the like caused by translation of the whole mechanic of the executable program, and further improve the execution efficiency of the second executable program.
For example, assuming that the first executable program is a C-file in the C-language, the second executable program may be a register transfer level (Register Transfer Level, RTL) file of application System Verilog. Alternatively, assuming that the first executable program is a cpp file written in C++, the second executable program may be an RTL file of application System Verilog.
Among the RTL files, the most important features of RTL-level languages are: RTL level descriptions are a comprehensive hierarchy of descriptions. The RTL file primarily describes the logical functions from register to register, and thus the HDL level of the circuit. RTL is a higher level of abstraction than gate level, and thus, describing hardware circuitry using RTL language is generally simpler and more efficient than gate level description. Based on this, in the embodiment of the present application, further optionally, the target module obtained by converting the module to be accelerated may be encapsulated into an RTL-level IP core, so as to further improve the execution efficiency of the target module. Specific packaging methods are described in the examples below, and are not described in detail herein.
Specifically, in order to achieve the targeted optimization of the module to be accelerated, in step 103, the module to be accelerated is reconstructed into the target module, so as to obtain the second executable program, as shown in fig. 2, which may be specifically implemented in the following steps 201 to 203:
Step 201, obtaining configuration information in a module to be accelerated;
step 202, converting first program information of an application software programming language in a module to be accelerated into second program information of an application hardware description language;
and 203, reconstructing the target module based on the configuration information and the second program information to obtain a second executable program.
In an embodiment of the present application, the second executable program application is a Hardware description language (Hardware DescriptionLanguage, HDL). A hardware description language is a language that describes electronic system hardware behavior, structure, data flow in text form. The hardware description language may represent logic circuit diagrams, logic expressions, and logic functions performed by a digital logic system. By using this language, the design concept of digital circuitry can be described layer by layer from top to bottom (from abstract to concrete), and extremely complex digital systems can be represented by a series of hierarchical modules. Then, using Electronic design automation (Electronic DesignAutomation, EDA) tool, simulation verification is performed layer by layer, and then the module combination which needs to be changed into an actual circuit is converted into a gate level circuit netlist through an automatic synthesis tool. Next, the netlist is converted to a specific circuit routing structure to be implemented using an application specific integrated circuit (Application SpecificIntegrated Circuit, ASIC) or field programmable gate array (Field Programmable Gate Array, FPGA) automatic place and route tool. In practice, hardware description languages include, but are not limited to: verilog HDL (Verilog for short), VHDL.
Compared with the prior art, the software programming language is closer to mathematical description, and the hardware description language is closer to a digital circuit, so that the hardware description language has advantages in terms of hardware resource calling and higher resource calling efficiency. For example, verilog is a hardware description language that supports parallel operation, and after being compiled and downloaded to the FPGA, generates corresponding digital circuits; the C language is a software programming language that is compiled and downloaded to the processor and then used as a set of instructions in the processor memory.
Based on the principle, the language applied by the part (such as the module to be accelerated) to be accelerated in the executable program is converted from the software programming language to the hardware description language through the steps, so that the specific computing function in the executable program is improved from sequential execution to concurrent execution, and the corresponding digital circuit is directly generated, so that the hardware resource can be more efficiently mobilized, and the execution efficiency of the executable program is further improved.
The following describes in detail the flow of the conversion from the module to be accelerated to the target module.
Step 201, obtaining configuration information in a module to be accelerated.
In the embodiment of the present application, the configuration information in the module to be accelerated at least includes: input information, output information, clock setting information, reset setting information. It is to be understood that the configuration information listed herein is merely an example, and is not limited to this in practical application. The configuration information is understood here as the basic setting of the module to be accelerated, and thus, it is necessary here to acquire the configuration information in order to re-set the basic setting of the target module when the subsequent language environment is converted from the software programming language to the hardware description language.
Step 202, converting the first program information of the application software programming language in the module to be accelerated into the second program information of the application hardware description language.
Specifically, the function to be implemented by the first program information may be directly described again by using the hardware description language, so as to obtain the second program information applying the hardware description language.
In an alternative embodiment, referring to fig. 3, the step of re-describing the second program information in step 202 may also be implemented as the following steps:
step 301, identifying keywords in the module to be accelerated;
step 302, analyzing the context information related to the keywords to obtain first program information corresponding to the keywords;
and 303, reconstructing the first program information to obtain the second program information.
In the embodiment of the application, the keywords in the module to be accelerated are preset according to the software programming language. For example, assuming that the software programming language is the C language, the keywords may be keywords related to data computation in the C language. Alternatively, the keywords may be key variables, such as key variables related to position information such as coordinate points and relative distances in the object positioning procedure.
Specifically, in step 301, the code information in the module to be accelerated is parsed to obtain an abstract syntax tree of the code information. Further, a plurality of nodes in the abstract syntax tree are analyzed to obtain keywords matched with preset keyword types in the code information.
Further, after the keywords in the module to be accelerated are identified, in step 302, the context information related to the keywords is analyzed, and the first program information corresponding to the keywords is obtained. And particularly, acquiring the context information of the associated node in the abstract syntax tree by adopting the keyword so as to obtain first program information corresponding to the keyword.
In the embodiment of the application, codes and parameter information related to each specific program function in the object positioning program are collectively called as program information. For distinction, the program information in the module to be accelerated is referred to as first program information, and the program information in the target module is referred to as second program information hereinafter.
After the first program information is obtained, the first program information needs to be converted into code and parameter information (i.e., second program information) applied in the hardware description language environment.
In step 303, the variables in the first program information are converted according to the variable mapping rule between the software programming language and the hardware description language, so as to obtain the variables in the second program information. Of course, in addition to converting the variables, the functions in the first program information may be converted according to a function mapping rule between a software programming language and a hardware description language, so as to obtain the functions in the second program information. Wherein the variable mapping rules and/or the function mapping rules are determined according to grammar rules of the software programming language and the hardware description language.
For example, assume that the first executable program application is in the C language and the second executable program application is in the Verilog language. Then, before step 303, a mapping rule from the C language to the Verilog language may also be established in advance, so that the variables and functions in the first program information are converted based on this mapping rule to obtain the variables and functions in the second program information.
Of course, since the software programming language and the hardware description language differ from each other in terms of design concept to implementation principle, redundancy or other anomalies may still occur using the mapping rule introduced above, and thus, in another alternative embodiment, another program information conversion manner is provided. Specifically, the first program information is pushed to related personnel, and the related personnel re-describe step logic to be realized in the first program information by adopting a hardware description language to obtain second program information, so that the optimization of the second program information is more flexible and more targeted.
In addition, to improve the automation degree of step 303, optionally, a new conversion process may be extracted from the first program information and the re-described second program information related to the above process, and dynamically updated into the variable mapping rule and/or the function mapping rule.
Of course, the manner of re-describing the second program information by the relevant personnel can be complemented with the above embodiment, and the relevant personnel can correct and optimize the second program information obtained by the above embodiment.
Further, to improve the automation degree of step 303, the first program information and the second program information that are described again and related in the above process may be used as a model training set, and the language conversion model may be trained by using the model training set, so as to obtain a language conversion model for converting the program information into the language conversion model.
And 203, reconstructing the target module based on the configuration information and the second program information to obtain the second executable program.
Specifically, for each module to be accelerated in the first executable program, the configuration information in the corresponding target module is reset based on the configuration information and the grammar rules of the hardware description language. And adding corresponding second program information into the set target module to complete the construction of the target module.
Through the steps 201 to 203, the language applied by the part (i.e. the module to be accelerated) to be accelerated in the executable program can be converted from the software programming language to the hardware description language, so that the implementation flow of the specific computing function in the executable program is from sequential execution to concurrent execution, and the execution efficiency of the executable program is further improved.
And 104, the processor sends the second executable program to the FPGA so that the FPGA executes the second executable program to realize hardware acceleration of the object positioning system.
Specifically, the processor sends the optimized second executable program to the FPGA. Furthermore, after the FPGA receives the second executable program, the second executable program may be executed to drive multiple target modules in the second executable program in parallel, so as to implement hardware acceleration for each specific computing function in the object positioning system.
In this embodiment, by identifying the module to be accelerated having a specific computing function in the executable program, the executable program is optimized with more targeted acceleration, so that the module for executing the specific computing function in the executable program is updated from the module to be accelerated which is sequentially executed to the target module which is concurrently executed, thereby realizing hardware acceleration of the specific computing function in the executable program, and constructing a pipeline mode on the hardware circuit structure level, thereby improving the execution mode of the program, greatly improving the data processing efficiency of the executable program, and improving the program execution speed.
In the above or the following embodiments, the present application further provides an optimization method for a module to be accelerated. As an alternative embodiment, the optimization of the module to be accelerated, as shown in fig. 4, may be implemented as the following steps:
step 401, analyzing a plurality of modules to be accelerated in the first executable program to obtain timing information of the plurality of modules to be accelerated;
step 402, based on the time sequence information of the plurality of modules to be accelerated, configuring a plurality of target modules to corresponding computing kernels respectively;
step 403, calling, in a pipeline manner, a hardware resource corresponding to the target module through the computing kernel, so as to execute the specific computing function through the plurality of target modules concurrently.
Specifically, in step 401, the first program information in the module to be accelerated is analyzed to obtain the read-write time information. The read-write time information includes an input value write time and an output value read time. Further, a time period of the input value write time and the output value read time interval is determined.
For example, referring to fig. 5, assume that the modules to be accelerated in the first executable program are three function modules shown in fig. 5, namely: func1, func2, func3. It is assumed that the central processing unit (Central Processing Unit, CPU) sequentially and repeatedly calls the three function modules in the execution order specified in the first executable program. The call mode shown in fig. 5 may be that: the three function modules form a function module sequence, and the CPU repeatedly calls the function module sequence according to the execution sequence of the three function modules.
Based on the above assumption, the first program information in the three function modules func1, func2, func3 is analyzed to obtain the read-write time information of each of the three function modules func1, func2, func 3. Here, the read-write time information includes the input value write time and the output value read time of each of the three modules. And calculates the time period between the input value writing time and the output value reading time of each of the three modules as the timing information of the three modules.
Further, after acquiring the time sequence information of the plurality of modules to be accelerated, in step 402, a plurality of target modules are respectively configured to corresponding computing cores based on the time sequence information of the plurality of modules to be accelerated. Specifically, a plurality of computing kernels corresponding to the target modules are created; and circularly calling the target modules in the computing cores, and setting the connection relation of the computing cores corresponding to the target modules according to the time period. Finally, in step 403, in a pipeline manner, the hardware resource corresponding to the target module is called by the computing kernel, so that the specific computing function is executed by the plurality of target modules concurrently.
Continuing with the above example, assume that the modules to be accelerated in the first executable program are three function modules shown in fig. 5, namely: func1, func2, func3. It is assumed that the CPU sequentially and repeatedly calls the three function modules in the execution order specified in the first executable program.
Based on the above assumption, the corresponding computation cores are created with reference to the three function modules in fig. 5, i.e., three cores, K1, K2, K3, respectively, are created in the FPGA as shown in fig. 6. The respective corresponding target modules, namely func1', func2', func3', are invoked in the cores K1, K2, K3 in a loop. In order to further improve the program execution efficiency, the calling periods of func1', func2', func3 'are adjusted according to the time periods of func1, func2 and func3, so that the calling periods of func1', func2', func3' can be matched with the output time of the previous target modules, and the time delay of program execution is further shortened.
Referring to the timing relationship shown in fig. 6, the kernels K1, K2, K3 may execute in parallel, so after the first input value WR is processed by the func1' called for the first time, the processing result may be transmitted to the func2', and the next input value WR may be processed continuously, without waiting for the processing result reported by the func3' to the Application programming interface (Application ProgrammingInterface, API) for the first time, that is, the first output value RD. Similarly, after the func2' finishes processing the first input processing result of the func1', the processing result of itself is also transmitted to the func3', and the processing of the next processing result input by the func1' is continued, without waiting for the first output value RD reported by the func3' to the API.
From the examples shown in fig. 5 and fig. 6, it is easy to find that, compared with func1, func2 and func3 which are sequentially executed, the input values can be continuously processed by calling and executing func1', func2' and func3' in a pipeline form, so that the time delay caused by waiting for the processing procedure from func1 to func3 after each input value is written is avoided, the read-write time of each input value and output value in the data stream is further shortened, the processing efficiency of the data stream is improved, and the hardware acceleration effect of program execution is improved.
In this embodiment, by creating multiple computing cores and calling corresponding target modules in the computing cores, the processing delay of the data stream caused by sequential execution of multiple modules to be accelerated is shortened, the read-write time of the input value and the output value is further shortened, the processing efficiency of the data stream is improved, and the hardware acceleration effect of program execution is improved.
In the above or below embodiments, the present application further provides a hardware acceleration method, that is, a method for implementing hardware acceleration by combining multiple development tools, so as to further speed up the execution efficiency of the program.
Specifically, as an optional embodiment, after the module to be accelerated is reconstructed into the target module to obtain the second executable program, the second executable program is further linked to the target hardware platform, and a device binary file is created; and interacting with a register conversion level circuit RTL kernel in a programmable logic area in the target hardware platform through a target runtime library and the device binary file, so as to realize hardware acceleration of an object positioning program. The device binary file is used for describing a hardware circuit structure corresponding to the program.
Specifically, assume that the target runtime library is a Sirtune (Xilinx) runtime library. The target hardware platform is assumed to be a target FPGA platform. Of course, the target runtime libraries to which the present application relates are not limited to the examples herein.
Based on the above assumption, in the above steps, the second executable program is linked to the target hardware platform, and a device binary file is created, as shown in fig. 7, which may be implemented as the following steps 601 to 602:
step 601, performing programmable logic PL kernel creation on the second executable program by using a first development tool to obtain a PL kernel;
and step 602, linking the PL kernel into a target FPGA platform through a second development tool, and creating the device binary file.
Let the first development tool be vivado and let the second development tool be vitis.
Based on the above assumption, in step 601, the second executable program is verified in the emulator of vivado. And if the second executable program passes the verification, packaging the RTL-level IP core in the second executable program into the PL core through a core compiling tool in vivado. In step 602, the PL kernel is connected to the target FPGA platform through a vitis linker, so as to obtain the device binary file.
Furthermore, after creating the device binary file, in the above steps, interaction with the register conversion level circuit RTL core in the programmable logic area in the target hardware platform through the target runtime library and the device binary file may be implemented as:
an application programming interface (Application Programming Interface, API) and drivers are run to interact with the host program in the Xilinx runtime library by calling the RTL kernel in the target FPGA platform through the device binary file.
For example, referring to fig. 8, after the C/c++ file (i.e., the first executable program) is subjected to the above processing of steps 101 to 103, an RTL file (i.e., the second executable program) is obtained.
In step 601, an RTL file is verified using a vivado emulator, including but not limited to: RTL synthesis, place and route, and timing analysis are verified until the timing report is reviewed to analyze that performance meets performance objectives, i.e., pass verification. In this case, the RTL level IP in the RTL file is encapsulated into kernel. Xo (i.e., PL kernel) using a design suite (i.e., vivado Design Suite shown in FIG. 8). In step 602, a kernel. Xo is connected to the FPGA platform through a PL kernel linker of the vitis to obtain a device binary file with a suffix xclbin. Finally, the device binary may be entered into a running application to interact with host.exe (i.e., host program) in the Xilinx runtime library. host.exe is obtained by processing a main.cpp, i.e., a C language file of a host program, in an x86 application compilation tool through a g++ compiler.
In this embodiment, the conversion from the second executable program to the digital circuit (in particular, the RTL circuit structure) is implemented, so that the hardware circuit resource is directly called by multiple development tools to accelerate the program, and further, the program execution efficiency is improved.
Having described the method of an embodiment of the present application, a description is next given of a hardware acceleration system of an embodiment of the present application with reference to fig. 9. In the hardware acceleration system shown in fig. 9, at least: a processor 100 and an FPGA200.
In the hardware acceleration system shown in fig. 9, the processor 100 is mainly configured to perform the following functions, namely: acquiring a first executable program, wherein the first executable program is used for realizing object positioning; identifying a module to be accelerated in a first executable program, wherein the module to be accelerated is used for realizing a specific computing function in the first executable program, and a plurality of modules to be accelerated in the first executable program are sequentially executed; reconstructing the module to be accelerated into a target module to obtain a second executable program, and sending the second executable program to the FPGA200; and the plurality of target modules in the second executable program are executed concurrently.
FPGA200 is primarily configured to perform the following functions: receiving a second executable program; the second executable program is executed to achieve hardware acceleration of the object positioning program.
In some alternative embodiments, the processor 100, when reconstructing the module to be accelerated into the target module, is configured to: acquiring configuration information in a module to be accelerated; the configuration information at least comprises: input information, output information, clock setting information, reset setting information; converting the first program information of the application software programming language in the module to be accelerated into the second program information of the application hardware description language; reconstructing the target module based on the configuration information and the second program information to obtain a second executable program.
In some alternative embodiments, the processor 100, when converting the first program information of the application software programming language in the module to be accelerated into the second program information of the application hardware description language, is configured to:
identifying keywords in a module to be accelerated, wherein the keywords are preset according to a software programming language; analyzing the context information related to the keywords to obtain first program information corresponding to the keywords; reconstructing the first program information to obtain second program information.
In some alternative embodiments, the processor 100, when identifying keywords in the module to be accelerated, is configured to:
Carrying out grammar analysis on code information in a module to be accelerated to obtain an abstract grammar tree of the code information; and analyzing a plurality of nodes in the abstract syntax tree to obtain keywords matched with the preset keyword types in the code information.
The processor is used for analyzing the context information related to the keywords and is configured to: and acquiring the context information of the associated node in the abstract syntax tree by adopting the keyword to obtain first program information corresponding to the keyword.
In some alternative embodiments, the processor 100, when reconstructing the first program information to obtain the second program information, is configured to:
converting the variables in the first program information according to the variable mapping rule between the software programming language and the hardware description language to obtain the variables in the second program information; and/or converting the function in the first program information according to the function mapping rule between the software programming language and the hardware description language so as to obtain the function in the second program information. Wherein the variable mapping rules and/or the function mapping rules are determined according to grammar rules of a software programming language and a hardware description language.
In some alternative embodiments, processor 100, when reconstructing the target module based on the configuration information and the second program information, is configured to:
setting configuration information in a corresponding target module based on the configuration information and grammar rules of a hardware description language for each module to be accelerated in the first executable program; and adding corresponding second program information into the set target module to complete the construction of the target module.
In some alternative embodiments, processor 100 is further configured to: analyzing a plurality of modules to be accelerated in the first executable program to obtain time sequence information of the plurality of modules to be accelerated; based on time sequence information of a plurality of modules to be accelerated, respectively configuring a plurality of target modules to corresponding computing kernels; in a pipeline mode, the hardware resources corresponding to the target modules are called through the computing kernel so as to execute specific computing functions through a plurality of target modules simultaneously.
In some alternative embodiments, the processor 100, when analyzing the plurality of modules to be accelerated in the first executable program to obtain timing information of the plurality of modules to be accelerated, is configured to:
analyzing the first program information in the module to be accelerated to obtain read-write time information; the read-write time information comprises input value write time and output value read time; a time period of an input value write time and an output value read time interval is determined.
The processor 100, when configuring a plurality of target modules to corresponding computing cores respectively based on timing information of a plurality of modules to be accelerated, is configured to: creating computing kernels corresponding to the target modules respectively; and circularly calling the target modules in the computing cores, and setting the connection relation of the computing cores corresponding to the target modules according to the time period.
In some alternative embodiments, processor 100 is further configured to: reconstructing the module to be accelerated into a target module, obtaining a second executable program, and then linking the second executable program to a target hardware platform to create a device binary file; and interacting with a register conversion level circuit RTL kernel in a programmable logic area in a target hardware platform through a target runtime library and a device binary file, so as to realize hardware acceleration of an object positioning program. The device binary file is used for describing a hardware circuit structure corresponding to the executable program.
In some alternative embodiments, the target hardware platform is assumed to be a target FPGA platform.
The processor 100, linking the second executable program to the target hardware platform, when creating the device binary file, is configured to: performing programmable logic PL kernel creation on the second executable program through the first development tool to obtain a PL kernel; and linking the PL kernel into the target FPGA platform through a second development tool, and creating a device binary file.
In some alternative embodiments, processor 100, when performing programmable logic PL kernel creation on the second executable program by the first development tool to obtain the PL kernel, is configured to: verifying the second executable program in a simulator of the first development tool; and if the second executable program passes the verification, packaging the RTL-level IP core into the PL core in the second executable program through a core compiling tool in the first development tool.
Processor 100, when creating a device binary file, links the PL kernel into the target FPGA platform through a second development tool, is configured to: and connecting the PL kernel with the target FPGA platform through a linker of the second development tool to obtain a device binary file.
In this embodiment, a processing architecture of a processor and an FPGA is introduced, a to-be-accelerated module having a specific computing function in an executable program is identified by the processor, so that the executable program is optimized in a targeted acceleration manner, the module for executing the specific computing function in the executable program is updated from the to-be-accelerated module executed sequentially to a target module executed concurrently, thereby more effectively utilizing hardware resources on the FPGA chip, realizing hardware acceleration of the specific computing function in the executable program, and constructing a pipeline mode on a hardware circuit structure level, thereby improving the execution mode of the program, greatly improving the data processing efficiency of the executable program, and improving the program execution speed.
In the foregoing or the following embodiments, in an embodiment of the present application, there is further provided a processor, where the processor is applied to an object positioning system, and the processor includes the following modules:
the receiving and transmitting module is configured to acquire a first executable program, and the first executable program is used for realizing object positioning;
a processing module configured to identify a module to be accelerated in the first executable program; the to-be-accelerated module is used for realizing a specific computing function in the first executable program, and a plurality of to-be-accelerated modules in the first executable program are sequentially executed; reconstructing the module to be accelerated into a target module to obtain a second executable program, and sending the second executable program to an FPGA; and the target modules in the second executable program are executed concurrently to realize hardware acceleration of the object positioning program in the FPGA.
The above processor implements each function of the processor side in the hardware acceleration system shown in fig. 9 through the transceiver module and the processing module, which is not described in detail herein.
In the foregoing or in the following embodiments, in an embodiment of the present application, there is further provided an FPGA, where the FPGA is applied to an object positioning system, the FPGA includes:
A transceiver module configured to receive a second executable program from the processor; the target modules are obtained by reconstruction based on a module to be accelerated after the processor identifies the module to be accelerated in a first executable program, and the first executable program is used for realizing object positioning;
and a processing module configured to execute the second executable program to achieve hardware acceleration of the object positioning program.
The FPGA realizes each function of the FPGA side in the hardware acceleration system shown in fig. 9 through the transceiver module and the processing module, and will not be described in detail herein.
Having described the method and system of embodiments of the present application, a description of a hardware acceleration device of embodiments of the present application follows with reference to FIG. 10.
The hardware acceleration device 90 in the embodiment of the present application can implement the steps corresponding to the hardware acceleration method in the embodiment corresponding to fig. 1. The functions implemented by the hardware accelerator 90 may be implemented by hardware, or may be implemented by hardware executing corresponding software. The hardware or software includes one or more modules corresponding to the functions described above, which may be software and/or hardware. The hardware acceleration device 90 is applied to a server device or a terminal device. The hardware accelerator 90 may include a transceiver module 901 and a processing module 902, where the functional implementation of the processing module 902 and the transceiver module 901 may refer to operations performed in the embodiment corresponding to fig. 1, and are not described herein. For example, the processing module 902 may be configured to control data transceiving operations of the transceiving module 901.
In some embodiments, the transceiver module 901 is configured to obtain a first executable program;
a processing module 902 configured to identify a module to be accelerated in the first executable program; the to-be-accelerated module is used for realizing a specific computing function in the first executable program, and a plurality of to-be-accelerated modules in the first executable program are sequentially executed; reconstructing the module to be accelerated into a target module to obtain a second executable program; wherein a plurality of the target modules in the second executable program are executed concurrently; and sending the second executable program to an FPGA so that the FPGA executes the second executable program to realize hardware acceleration of the object positioning system.
In some embodiments, the processing module 902 reconstructs the module to be accelerated into a target module, and when obtaining the second executable program, is configured to:
acquiring configuration information in the module to be accelerated; the configuration information at least comprises: input information, output information, clock setting information, reset setting information;
converting the first program information of the application software programming language in the module to be accelerated into second program information of the application hardware description language;
Reconstructing the target module based on the configuration information and the second program information to obtain the second executable program.
In some embodiments, when the processing module 902 converts the first program information of the application software programming language in the module to be accelerated into the second program information of the application hardware description language, the processing module is configured to:
identifying keywords in the module to be accelerated; the keywords are preset according to a software programming language;
analyzing the context information related to the keywords to obtain first program information corresponding to the keywords;
reconstructing the first program information to obtain the second program information.
In some embodiments, when the processing module 902 identifies keywords in the module to be accelerated, it is configured to:
carrying out grammar analysis on the code information in the module to be accelerated to obtain an abstract grammar tree of the code information;
analyzing a plurality of nodes in the abstract syntax tree to obtain keywords matched with preset keyword types in the code information;
the processing module 902 analyzes the context information related to the keyword, and when obtaining the first program information corresponding to the keyword, is configured to:
And acquiring the context information of the associated node in the abstract syntax tree by adopting the keyword to obtain first program information of the keyword.
In some embodiments, the processing module 902, when reconstructing the first program information to obtain the second program information, is configured to:
converting variables in the first program information according to variable mapping rules between a software programming language and a hardware description language to obtain variables in the second program information; and/or
Converting the function in the first program information according to a function mapping rule between a software programming language and a hardware description language to obtain the function in the second program information;
wherein the variable mapping rules and/or the function mapping rules are determined according to grammar rules of the software programming language and the hardware description language.
In some embodiments, the processing module 902, when reconstructing the target module based on the configuration information and the second program information, is configured to:
for each module to be accelerated in the first executable program, setting configuration information in a corresponding target module based on the configuration information and grammar rules of a hardware description language;
And adding corresponding second program information into the set target module to complete the construction of the target module.
In some embodiments, the processing module 902 is further configured to:
analyzing a plurality of modules to be accelerated in the first executable program to obtain time sequence information of the plurality of modules to be accelerated;
based on time sequence information of the plurality of modules to be accelerated, respectively configuring the plurality of target modules to corresponding computing kernels;
and calling hardware resources corresponding to the target modules through the computing kernel in a pipeline mode so as to execute specific computing functions through a plurality of target modules simultaneously.
In some embodiments, when the processing module 902 analyzes the plurality of modules to be accelerated in the first executable program to obtain timing information of the plurality of modules to be accelerated, the processing module is configured to:
analyzing the first program information in the module to be accelerated to obtain read-write time information; the read-write time information comprises input value write time and output value read time;
determining a time period of the input value write time and the output value read time interval;
The processing module 902 configures the plurality of target modules to corresponding computing kernels respectively based on timing information of the plurality of modules to be accelerated, and is configured to:
and circularly calling the target modules in the computing cores, and setting the connection relation of the computing cores corresponding to the target modules according to the time period.
In some embodiments, the processing module 902 is further configured to:
linking the second executable program to a target hardware platform to create a device binary file;
through a target runtime library and the device binary file, interacting with a register conversion level circuit RTL kernel in a programmable logic area in the target hardware platform to realize hardware acceleration of an object positioning program;
the device binary file is used for describing a hardware circuit structure corresponding to the program.
In some embodiments, the target hardware platform is a target field programmable gate array FPGA platform;
the processing module 902, when linking the second executable program to a target hardware platform, is configured to:
performing programmable logic PL kernel creation on the second executable program through a first development tool to obtain a PL kernel;
And linking the PL kernel to a target FPGA platform through a second development tool, and creating the device binary file.
In some embodiments, the processing module 902, when performing programmable logic PL kernel creation on the second executable program by the first development tool to obtain a PL kernel, is configured to:
validating the second executable program in a simulator of a first development tool;
if the second executable program passes the verification, packaging an RTL-level IP core in the second executable program into the PL core through a core compiling tool in a first development tool;
the processing module 902 links the PL kernel to a target FPGA platform through a second development tool, and when creating the device binary file, is configured to:
and connecting the PL kernel with a target FPGA platform through a linker of a second development tool to obtain the device binary file.
In the embodiment of the application, the module for executing the specific calculation function in the executable program is updated to the concurrently executed target module from the sequentially executed module for acceleration by identifying the module for acceleration with the specific calculation function in the executable program, so that the hardware acceleration of the specific calculation function in the executable program is realized, and the pipeline mode is constructed on the hardware circuit structure level, thereby improving the execution mode of the program and greatly improving the data processing efficiency of the executable program.
Having described the methods, systems, and apparatus of embodiments of the present application, a description will now be made of a computer-readable storage medium of embodiments of the present application, which may be an optical disc having a computer program (i.e., a program product) stored thereon, which when executed by a processor, performs the steps described in the method embodiments described above, for example, obtaining a first executable program; identifying a module to be accelerated in the first executable program; the to-be-accelerated module is used for realizing a specific computing function in the first executable program, and a plurality of to-be-accelerated modules in the first executable program are sequentially executed; reconstructing the module to be accelerated into a target module to obtain a second executable program; and the target modules in the second executable program are executed concurrently to realize hardware acceleration of the object positioning program. The specific implementation of each step is not repeated here.
It should be noted that examples of the computer readable storage medium may also include, but are not limited to, a phase change memory (PRAM), a Static Random Access Memory (SRAM), a Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a flash memory, or other optical or magnetic storage medium, which will not be described in detail herein.
The hardware acceleration device 90 in the embodiment of the present application is described above from the point of view of a modularized functional entity, and the server and the terminal device for executing the hardware acceleration method in the embodiment of the present application are described below from the point of view of hardware processing, respectively.
It should be noted that, in the embodiment of the hardware acceleration device of the present application, the entity device corresponding to the transceiver module 901 shown in fig. 10 may be an input/output unit, a transceiver, a radio frequency circuit, a communication module, an input/output (I/O) interface, etc., and the entity device corresponding to the processing module 902 may be a processor. The hardware acceleration device 90 shown in fig. 10 may have a structure as shown in fig. 11, and when the hardware acceleration device 90 shown in fig. 10 has a structure as shown in fig. 11, the processor and the transceiver in fig. 11 can implement the same or similar functions as the processing module 902 and the transceiver module 901 provided in the foregoing device embodiment corresponding to the device, and the memory in fig. 11 stores a computer program that needs to be invoked when the processor executes the above-described hardware acceleration method.
Fig. 12 is a schematic diagram of a server structure provided in an embodiment of the present application, where the server 1100 may vary considerably in configuration or performance, and may include one or more central processing units (central processingunits, CPU) 1122 (e.g., one or more processors) and memory 1132, one or more storage mediums 1130 (e.g., one or more mass storage devices) storing applications 1142 or data 1144. Wherein the memory 1132 and the storage medium 1130 may be transitory or persistent. The program stored on the storage medium 1130 may include one or more modules (not shown), each of which may include a series of instruction operations on a server. Still further, the central processor 1122 may be provided in communication with a storage medium 1130, executing a series of instruction operations in the storage medium 1130 on the server 1100.
The Server 1100 may also include one or more power supplies 1127, one or more wired or wireless network interfaces 1180, one or more input/output interfaces 1159, and/or one or more operating systems 1141, such as Windows Server, mac OS X, unix, linux, freeBSD, and the like.
The steps performed by the server in the above embodiments may be based on the structure of the server 1100 shown in fig. 12. For example, the steps performed by the hardware acceleration device 80 shown in fig. 12 in the above-described embodiment may be based on the server structure shown in fig. 12. For example, the CPU 1122 may perform the following operations by calling instructions in the memory 1132:
receiving the get first executable program through the input output interface 1159;
identifying a module to be accelerated in the first executable program; the to-be-accelerated module is used for realizing a specific computing function in the first executable program, and a plurality of to-be-accelerated modules in the first executable program are sequentially executed;
reconstructing the module to be accelerated into a target module to obtain a second executable program; and the target modules in the second executable program are executed concurrently to realize hardware acceleration of the object positioning program.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to related descriptions of other embodiments.
It will be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the systems, apparatuses and modules described above may refer to the corresponding processes in the foregoing method embodiments, which are not repeated herein.
In the embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, and for example, the division of the modules is merely a logical function division, and there may be additional divisions when actually implemented, for example, multiple modules or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or modules, which may be in electrical, mechanical, or other forms.
The modules described as separate components may or may not be physically separate, and components shown as modules may or may not be physical modules, i.e., may be located in one place, or may be distributed over a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional module in each embodiment of the present application may be integrated into one processing module, or each module may exist alone physically, or two or more modules may be integrated into one module. The integrated modules may be implemented in hardware or in software functional modules. The integrated modules, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product.
The computer program product includes one or more computer instructions. When the computer program is loaded and executed on a computer, the flow or functions according to the embodiments of the present application are fully or partially produced. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be stored by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk (SSD)), etc.
The above description has been made in detail on the technical solutions provided by the embodiments of the present application, and specific examples are applied in the embodiments of the present application to illustrate the principles and implementation manners of the embodiments of the present application, where the above description of the embodiments is only for helping to understand the methods and core ideas of the embodiments of the present application; meanwhile, as for those skilled in the art, according to the idea of the embodiment of the present application, there are various changes in the specific implementation and application scope, and in summary, the present disclosure should not be construed as limiting the embodiment of the present application.

Claims (27)

1. A hardware acceleration system, comprising a processor and a field programmable gate array FPGA; wherein the method comprises the steps of
The processor is configured to acquire a first executable program, and the first executable program is used for realizing object positioning; identifying a module to be accelerated of an application software programming language in the first executable program, wherein the module to be accelerated is used for realizing a specific computing function in the first executable program, and a plurality of modules to be accelerated in the first executable program are executed in sequence; reconstructing the module to be accelerated into a target module applying a hardware description language to obtain a second executable program, and sending the second executable program to the FPGA; wherein a plurality of the target modules in the second executable program are executed concurrently;
The FPGA configured to receive the second executable program; the second executable program is executed to achieve hardware acceleration of the object positioning program.
2. The system of claim 1, wherein the processor, when reconstructing the module to be accelerated into a target module, is configured to:
acquiring configuration information in the module to be accelerated; the configuration information at least comprises: input information, output information, clock setting information, reset setting information;
converting the first program information of the application software programming language in the module to be accelerated into second program information of the application hardware description language;
reconstructing the target module based on the configuration information and the second program information to obtain the second executable program.
3. The system of claim 2, wherein the processor, when converting the first program information of the application programming language in the module to be accelerated to the second program information of the application hardware description language, is configured to:
identifying keywords in the module to be accelerated; the keywords are preset according to a software programming language;
Analyzing the context information related to the keywords to obtain first program information corresponding to the keywords;
reconstructing the first program information to obtain the second program information.
4. The system of claim 3, wherein the processor, when identifying keywords in the module to be accelerated, is configured to:
carrying out grammar analysis on the code information in the module to be accelerated to obtain an abstract grammar tree of the code information;
analyzing a plurality of nodes in the abstract syntax tree to obtain keywords matched with preset keyword types in the code information;
the processor is configured to, when analyzing the context information related to the keyword and obtaining the first program information corresponding to the keyword, obtain the first program information corresponding to the keyword:
and acquiring the context information of the associated node in the abstract syntax tree by adopting the keyword to obtain first program information corresponding to the keyword.
5. The system of claim 3, wherein the processor, when reconstructing the first program information to obtain the second program information, is configured to:
Converting variables in the first program information according to variable mapping rules between a software programming language and a hardware description language to obtain variables in the second program information; and/or
Converting the function in the first program information according to a function mapping rule between a software programming language and a hardware description language to obtain the function in the second program information;
wherein the variable mapping rules and/or the function mapping rules are determined according to grammar rules of the software programming language and the hardware description language.
6. The system of claim 3, wherein the processor, when reconstructing the target module based on the configuration information and the second program information, is configured to:
for each module to be accelerated in the first executable program, setting configuration information in a corresponding target module based on the configuration information and grammar rules of a hardware description language;
and adding corresponding second program information into the set target module to complete the construction of the target module.
7. The system of any one of claims 1 to 6, wherein the processor is further configured to:
Analyzing a plurality of modules to be accelerated in the first executable program to obtain time sequence information of the plurality of modules to be accelerated;
based on time sequence information of the plurality of modules to be accelerated, respectively configuring the plurality of target modules to corresponding computing kernels;
and calling hardware resources corresponding to the target modules through the computing kernel in a pipeline mode so as to execute specific computing functions through a plurality of target modules simultaneously.
8. The system of claim 7, wherein the processor, when analyzing the plurality of modules to be accelerated in the first executable program to obtain the timing information of the plurality of modules to be accelerated, is configured to:
analyzing the first program information in the module to be accelerated to obtain read-write time information; the read-write time information comprises input value write time and output value read time;
determining a time period of the input value write time and the output value read time interval;
the processor, when configuring the plurality of target modules to the corresponding computing cores respectively based on the time sequence information of the plurality of modules to be accelerated, is configured to:
Creating computing kernels corresponding to the target modules respectively;
and circularly calling the target modules in the computing cores, and setting the connection relation of the computing cores corresponding to the target modules according to the time period.
9. The system of any one of claims 1 to 6, wherein the processor is further configured to:
reconstructing the module to be accelerated into a target module, obtaining a second executable program, and then linking the second executable program to a target hardware platform to create a device binary file;
through a target runtime library and the device binary file, interacting with a register conversion level circuit RTL kernel in a programmable logic area in the target hardware platform to realize hardware acceleration of an object positioning program;
the device binary file is used for describing a hardware circuit structure corresponding to the executable program.
10. The system of claim 9, wherein the target hardware platform is a target FPGA platform;
the processor, when linking the second executable program to the target hardware platform and creating the device binary file, is configured to:
Performing programmable logic PL kernel creation on the second executable program through a first development tool to obtain a PL kernel;
and linking the PL kernel to a target FPGA platform through a second development tool, and creating the device binary file.
11. The system of claim 10, wherein the processor, when performing programmable logic PL kernel creation on the second executable program by the first development tool to obtain a PL kernel, is configured to:
validating the second executable program in a simulator of the first development tool;
if the second executable program passes the verification, packaging an RTL-level IP core in the second executable program into the PL core through a core compiling tool in the first development tool;
the processor, when creating the device binary file, is configured to link the PL kernel to a target FPGA platform through a second development tool:
and connecting the PL kernel with a target FPGA platform through a linker of the second development tool to obtain the device binary file.
12. A processor for use in an object positioning system, the processor comprising:
The receiving and transmitting module is configured to acquire a first executable program, and the first executable program is used for realizing object positioning;
the processing module is configured to identify a module to be accelerated of an application software programming language in the first executable program; the to-be-accelerated module is used for realizing a specific computing function in the first executable program, and a plurality of to-be-accelerated modules in the first executable program are sequentially executed; reconstructing the module to be accelerated into a target module of an application hardware description language to obtain a second executable program; sending the second executable program to an FPGA; and the target modules in the second executable program are executed concurrently to realize hardware acceleration of the object positioning program in the FPGA.
13. An FPGA for use in an object positioning system, the FPGA comprising:
a transceiver module configured to receive a second executable program from the processor; the method comprises the steps that a plurality of target modules of an application hardware description language in a second executable program are executed concurrently, the target modules are obtained by reconstructing a to-be-accelerated module of an application software programming language in a first executable program after the processor identifies the to-be-accelerated module, and the first executable program is used for realizing object positioning;
And a processing module configured to execute the second executable program to achieve hardware acceleration of the object positioning program.
14. A method of hardware acceleration for an object positioning system, the method comprising:
acquiring a first executable program, wherein the first executable program is used for realizing object positioning;
identifying a module to be accelerated of an application software programming language in the first executable program; the to-be-accelerated module is used for realizing a specific computing function in the first executable program, and a plurality of to-be-accelerated modules in the first executable program are sequentially executed;
reconstructing the module to be accelerated into a target module of an application hardware description language to obtain a second executable program; wherein a plurality of the target modules in the second executable program are executed concurrently;
and sending the second executable program to an FPGA so that the FPGA executes the second executable program to realize hardware acceleration of the object positioning system.
15. The method of claim 14, wherein reconstructing the module to be accelerated as a target module results in a second executable program, comprising:
Acquiring configuration information in the module to be accelerated; the configuration information at least comprises: input information, output information, clock setting information, reset setting information;
converting the first program information of the application software programming language in the module to be accelerated into second program information of the application hardware description language;
reconstructing the target module based on the configuration information and the second program information to obtain the second executable program.
16. The method of claim 15, wherein converting the first program information of the application programming language in the module to be accelerated to the second program information of the application hardware description language comprises:
identifying keywords in the module to be accelerated; the keywords are preset according to a software programming language;
analyzing the context information related to the keywords to obtain first program information corresponding to the keywords;
reconstructing the first program information to obtain the second program information.
17. The method of claim 16, wherein the identifying keywords in the module to be accelerated comprises:
Carrying out grammar analysis on the code information in the module to be accelerated to obtain an abstract grammar tree of the code information;
analyzing a plurality of nodes in the abstract syntax tree to obtain keywords matched with preset keyword types in the code information;
the analyzing the context information related to the keywords to obtain first program information corresponding to the keywords includes:
and acquiring the context information of the associated node in the abstract syntax tree by adopting the keyword to obtain first program information corresponding to the keyword.
18. The method of claim 16, wherein reconstructing the first program information to obtain the second program information comprises:
converting variables in the first program information according to variable mapping rules between a software programming language and a hardware description language to obtain variables in the second program information; and/or
Converting the function in the first program information according to a function mapping rule between a software programming language and a hardware description language to obtain the function in the second program information;
wherein the variable mapping rules and/or the function mapping rules are determined according to grammar rules of the software programming language and the hardware description language.
19. The method of claim 16, wherein reconstructing the target module based on the configuration information and the second program information comprises:
for each module to be accelerated in the first executable program, setting configuration information in a corresponding target module based on the configuration information and grammar rules of a hardware description language;
and adding corresponding second program information into the set target module to complete the construction of the target module.
20. The method according to any one of claims 14 to 19, further comprising:
analyzing a plurality of modules to be accelerated in the first executable program to obtain time sequence information of the plurality of modules to be accelerated;
based on time sequence information of the plurality of modules to be accelerated, respectively configuring the plurality of target modules to corresponding computing kernels;
and calling hardware resources corresponding to the target modules through the computing kernel in a pipeline mode so as to execute specific computing functions through a plurality of target modules simultaneously.
21. The method of claim 20, wherein analyzing the plurality of modules to be accelerated in the first executable program to obtain timing information of the plurality of modules to be accelerated comprises:
Analyzing the first program information in the module to be accelerated to obtain read-write time information; the read-write time information comprises input value write time and output value read time;
determining a time period of the input value write time and the output value read time interval;
the configuring the plurality of target modules to the corresponding computing kernels based on the time sequence information of the plurality of modules to be accelerated respectively includes:
creating computing kernels corresponding to the target modules respectively;
and circularly calling the target modules in the computing cores, and setting the connection relation of the computing cores corresponding to the target modules according to the time period.
22. The method according to any one of claims 14 to 19, wherein after reconstructing the module to be accelerated into a target module, obtaining the second executable program, further comprises:
linking the second executable program to a target hardware platform to create a device binary file;
through a target runtime library and the device binary file, interacting with a register conversion level circuit RTL kernel in a programmable logic area in the target hardware platform to realize hardware acceleration of an object positioning program;
The device binary file is used for describing a hardware circuit structure corresponding to the executable program.
23. The method of claim 22, wherein the target hardware platform is a target field programmable gate array FPGA platform;
said linking said second executable program to a target hardware platform, creating a device binary file, comprising:
performing programmable logic PL kernel creation on the second executable program through a first development tool to obtain a PL kernel;
and linking the PL kernel to a target FPGA platform through a second development tool, and creating the device binary file.
24. The method of claim 23, wherein the programmable logic PL kernel creation of the second executable program by the first development tool to obtain a PL kernel comprises:
validating the second executable program in a simulator of the first development tool;
if the second executable program passes the verification, packaging an RTL-level IP core in the second executable program into the PL core through a core compiling tool in the first development tool;
the linking the PL kernel to a target FPGA platform through a second development tool, creating the device binary file, including:
And connecting the PL kernel with a target FPGA platform through a linker of the second development tool to obtain the device binary file.
25. A method of hardware acceleration for an object positioning system, the method comprising:
receiving a second executable program from the processor; the method comprises the steps that a plurality of target modules of an application hardware description language in a second executable program are executed concurrently, the target modules are obtained by reconstructing a to-be-accelerated module of an application software programming language in a first executable program after the processor identifies the to-be-accelerated module, and the first executable program is used for realizing object positioning;
the second executable program is executed to achieve hardware acceleration of the object positioning system.
26. A computer readable storage medium comprising instructions which, when run on a computer, cause the computer to perform the hardware acceleration method of any one of claims 14-24.
27. A computing device, comprising: a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the hardware acceleration method of any one of claims 14-24 when the computer program is executed.
CN202310502779.6A 2023-05-06 2023-05-06 Hardware acceleration system, method and related device Active CN116228515B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310502779.6A CN116228515B (en) 2023-05-06 2023-05-06 Hardware acceleration system, method and related device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310502779.6A CN116228515B (en) 2023-05-06 2023-05-06 Hardware acceleration system, method and related device

Publications (2)

Publication Number Publication Date
CN116228515A CN116228515A (en) 2023-06-06
CN116228515B true CN116228515B (en) 2023-08-18

Family

ID=86569799

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310502779.6A Active CN116228515B (en) 2023-05-06 2023-05-06 Hardware acceleration system, method and related device

Country Status (1)

Country Link
CN (1) CN116228515B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117076725B (en) * 2023-09-12 2024-02-09 北京云枢创新软件技术有限公司 Method, electronic device and medium for searching tree nodes based on underlying data

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111324558A (en) * 2020-02-05 2020-06-23 苏州浪潮智能科技有限公司 Data processing method and device, distributed data stream programming framework and related components
CN112631684A (en) * 2020-12-30 2021-04-09 北京元心科技有限公司 Executable program running method and device, electronic equipment and computer storage medium
CN113312098A (en) * 2020-04-01 2021-08-27 阿里巴巴集团控股有限公司 Program loading method, device, system and storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111324558A (en) * 2020-02-05 2020-06-23 苏州浪潮智能科技有限公司 Data processing method and device, distributed data stream programming framework and related components
CN113312098A (en) * 2020-04-01 2021-08-27 阿里巴巴集团控股有限公司 Program loading method, device, system and storage medium
CN112631684A (en) * 2020-12-30 2021-04-09 北京元心科技有限公司 Executable program running method and device, electronic equipment and computer storage medium

Also Published As

Publication number Publication date
CN116228515A (en) 2023-06-06

Similar Documents

Publication Publication Date Title
Windh et al. High-level language tools for reconfigurable computing
US7996816B2 (en) Method and apparatus for dynamically binding service component implementations for specific unit test cases
US9367658B2 (en) Method and apparatus for designing and generating a stream processor
US11474797B2 (en) Generating closures from abstract representation of source code
US9665674B2 (en) Automating a microarchitecture design exploration environment
US20160092181A1 (en) Automatic source code generation for accelerated function calls
US9256437B2 (en) Code generation method, and information processing apparatus
EP2706459B1 (en) Apparatus and method for validating a compiler for a reconfigurable processor
US11068247B2 (en) Vectorizing conditional min-max sequence reduction loops
WO2021000971A1 (en) Method and device for generating operation data and related product
EP3895022B1 (en) Improving emulation and tracing performance using compiler-generated emulation optimization metadata
CN116228515B (en) Hardware acceleration system, method and related device
US8291397B2 (en) Compiler optimized function variants for use when return codes are ignored
Riener et al. metaSMT: focus on your application and not on solver integration
CN114830135A (en) Hierarchical partitioning of operators
Pit-Claudel et al. Effective simulation and debugging for a high-level hardware language using software compilers
US10884720B2 (en) Memory ordering annotations for binary emulation
JP7410269B2 (en) Automated verification of high-level construct optimization using test vectors
Bombieri et al. HDTLib: an efficient implementation of SystemC data types for fast simulation at different abstraction levels
CN112232003B (en) Method for simulating design, electronic device and storage medium
Giorgi et al. Translating timing into an architecture: the synergy of COTSon and HLS (domain expertise—designing a computer architecture via HLS)
Agostini et al. AXI4MLIR: User-Driven Automatic Host Code Generation for Custom AXI-Based Accelerators
Bombieri et al. A methodology to recover RTL IP functionality for automatic generation of SW applications
CN117075912B (en) Method for program language conversion, compiling method and related equipment
CN110928558B (en) Method for installing program, embedded system and method for generating additional information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant