CN117313814A

CN117313814A - Intelligent decision-making and power-calculating acceleration system of unmanned aerial vehicle

Info

Publication number: CN117313814A
Application number: CN202311596320.3A
Authority: CN
Inventors: 高阳; 李�浩; 常惠; 聂勤; 张启悦; 郝彦济; 张鑫辰
Original assignee: Institute of Automation of Chinese Academy of Science; AVIC Chengdu Aircraft Design and Research Institute
Current assignee: Institute of Automation of Chinese Academy of Science; AVIC Chengdu Aircraft Design and Research Institute
Priority date: 2023-11-28
Filing date: 2023-11-28
Publication date: 2023-12-29
Anticipated expiration: 2043-11-28
Also published as: CN117313814B

Abstract

The invention relates to the technical field of embedded software and hardware, and provides an intelligent decision-making power acceleration system of an unmanned aerial vehicle. According to the system, the advantages of the central processor and the FPGA chip are combined, tasks are decomposed and executed on the central processor and the FPGA chip, and the target intelligent decision model is deployed on the FPGA chip, so that the processing cost of the tasks can be reduced, the task processing efficiency is improved, the operation efficiency of the target intelligent decision model is improved, and the intelligent decision efficiency of the unmanned aerial vehicle is further improved. Moreover, the embedded heterogeneous hardware structure of the system can realize high-speed communication through the XDMA system, and the problem of transmission delay caused by task decomposition is reduced.

Description

Intelligent decision-making and power-calculating acceleration system of unmanned aerial vehicle

Technical Field

The invention relates to the technical field of embedded software and hardware, in particular to an intelligent decision-making power acceleration system of an unmanned aerial vehicle.

Background

With the rapid equipment application of novel weaponry, data collected and processed by an intelligent control tactical decision system is exponentially increased, and the existing airborne computing system is difficult to meet the existing air combat development requirements in terms of computing power, accuracy, reliability, visualization and the like. To address the needs of the aeronautical computing task, the use of custom hardware accelerator architecture may be optimized for specific problems.

The field-editable gate array (Field Programmable Gate Array, FPGA) is composed of gate circuits and functional modules to form an embedded accelerator with reconfigurable capability, and the greatest acceleration advantage is that data can be processed in a pipelining manner according to clock cycles, and a plurality of hardware circuits can be concurrent to process different computing tasks. Meanwhile, the abundant logic resources of the FPGA can be configured into different types of interface circuits, and video streams are directly read from an external interface, so that processing delay is greatly reduced.

However, FPGA single accelerator based computing architectures do not meet multiple types of aviation embedded computing tasks.

Disclosure of Invention

The invention provides an unmanned aerial vehicle intelligent decision-making power acceleration system which is used for solving the defects in the prior art.

The invention provides an unmanned aerial vehicle intelligent decision power acceleration system, which comprises an embedded heterogeneous hardware structure constructed based on a central processing unit and an FPGA chip, wherein the central processing unit and the FPGA chip are interconnected based on an XDMA system;

the XDMA system comprises a PCIE system positioned at one side of the central processor and a PCIE interface corresponding to the PCIE system on the FPGA chip, wherein a first base address register space and a second base address register space are configured in the PCIE system, the first base address register space is used for accessing a control register group corresponding to the FPGA chip, and the second base address register space is used for accessing a data interface of the FPGA chip;

A target intelligent decision model is deployed on the FPGA chip;

the central processing unit is used for acquiring real-time state information of the unmanned aerial vehicle to be decided and transmitting the real-time state information to the data interface through the PCIE system;

the FPGA chip is used for calling the target intelligent decision model based on the control register set, inputting the real-time state information into the target intelligent decision model, selecting an optimal target low-level strategy from all target low-level strategies by a target strategy selector in the target intelligent decision model based on the real-time state information as a control strategy, and returning the control strategy to the central processor through the PCIE interface;

and the central processing unit is also used for controlling the actions of the unmanned aerial vehicle to be decided based on the control strategy.

According to the intelligent decision-making power acceleration system of the unmanned aerial vehicle, the control register set comprises: a control register, a status register, an input data register, and an output data register;

the FPGA chip further comprises a programmable logic unit;

the programmable logic unit is used for writing the real-time state information into the input data register, calling the target intelligent decision model to calculate after the start bit of the control register is written with 1, and writing a calculation result into the output data register;

The programmable logic unit is further configured to read a calculation status bit that indicates whether the programmable logic unit has completed calculation, and output a calculation result when the calculation status bit is zero as the control policy.

According to the intelligent decision-making power acceleration system of the unmanned aerial vehicle, the IP interface of the target intelligent decision-making model comprises a control interface, an input signal interface, an output signal interface and an output state interface;

the control interface is used for receiving a starting signal and an ending indication signal; the starting signal is used for indicating the target intelligent decision model to start calculation, and the ending indicating signal is used for indicating the target intelligent decision model to finish calculation;

the input signal interface is used for accessing the real-time state information;

the output state interface is used for outputting the effective state of the control strategy;

the output signal interface is used for latching and outputting the control strategy when the control interface receives the ending indication signal and the valid state is valid.

According to the intelligent decision-making power acceleration system of the unmanned aerial vehicle, the central processing unit is further used for acquiring motion state information of the unmanned aerial vehicle models of the two parties in a simulation fight scene of the unmanned aerial vehicle intelligent bodies of the two parties, performing injury calculation and win-lose judgment on the simulation fight process of the unmanned aerial vehicle models of the two parties, constructing a training data set, and transmitting the training data set to the FPGA chip through the PCIE system;

The FPGA chip is further used for carrying out layered training on the strategy selector and each low-level strategy in the initial intelligent decision model by adopting a target depth Q network based on the training data set to obtain a target intelligent decision model;

wherein, each low-level strategy comprises a control area strategy, an aggressive shooting strategy and a defensive shooting strategy, and the strategy selector is used for selecting the optimal low-level strategy from the low-level strategies; the evaluation network and the target network of the target deep Q network both comprise competing network structures, and the competing network structures comprise a state value network and an action dominant network which are connected with the hidden layer.

According to the intelligent decision-making power acceleration system of the unmanned aerial vehicle, the embedded heterogeneous hardware structure is verified through register conversion stage circuit level simulation, and the target intelligent decision model after layering training is tested.

According to the intelligent decision-making power acceleration system of the unmanned aerial vehicle, an AXI-Lite slave interface is configured on the FPGA chip;

the central processing unit is further used for sending an access request to the AXI-Lite slave interface based on the first base address register space;

The FPGA chip is also used for converting the access request into a read-write request of the control register set and accessing the control register set based on the read-write request.

According to the intelligent decision-making power acceleration system of the unmanned aerial vehicle, the data interface is an AXI interface and is used for accessing the static random access memory on the FPGA chip and the DDR controller corresponding to the FPGA chip through the AXI bus.

According to the intelligent decision-making power acceleration system of the unmanned aerial vehicle, the DDR controller comprises a control module and a physical interface;

the control module is used for receiving the transmission data of the AXI bus and adopting the physical interface to read and write access the DDR based on the transmission data.

According to the unmanned aerial vehicle intelligent decision-making power acceleration system provided by the invention, the target intelligent decision-making model is generated in a high-level comprehensive mode.

According to the intelligent decision-making power acceleration system of the unmanned aerial vehicle, the central processing unit is provided with a user interface;

the central processing unit is specifically configured to obtain the real-time status information through the user interface, and transmit the real-time status information to the FPGA chip through the PCIE system.

The intelligent decision-making power accelerating system of the unmanned aerial vehicle, which is provided by the invention, meets various types of aviation embedded computing tasks by means of an embedded heterogeneous hardware structure constructed by a central processing unit and an FPGA chip. According to the system, the advantages of the central processor and the FPGA chip are combined, tasks are decomposed and executed on the central processor and the FPGA chip, and the target intelligent decision model is deployed on the FPGA chip, so that the processing cost of the tasks can be reduced, the task processing efficiency is improved, the operation efficiency of the target intelligent decision model is improved, and the intelligent decision efficiency of the unmanned aerial vehicle is further improved. Moreover, the embedded heterogeneous hardware structure of the system can realize high-speed communication through the XDMA system, and the problem of transmission delay caused by task decomposition is reduced.

Drawings

In order to more clearly illustrate the invention or the technical solutions of the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious to those skilled in the art that other drawings can be obtained according to these drawings without inventive effort.

FIG. 1 is a schematic diagram of the architecture of the unmanned aerial vehicle intelligent decision-making power acceleration system provided by the invention;

Fig. 2 is a logic schematic diagram of a programmable logic unit of the FPGA chip 2 in the intelligent decision-making acceleration system of the unmanned aerial vehicle according to the present invention;

FIG. 3 is a schematic diagram of an IP interface of the target intelligent decision model provided by the invention;

FIG. 4 is a schematic diagram of a deployment structure of a target intelligent decision model on an FPGA chip;

FIG. 5 is a schematic view of the range of missile attack by the unmanned aerial vehicle agent of the present invention;

fig. 6 is a schematic diagram of a competing network structure provided by the present invention;

FIG. 7 is a schematic diagram of a code flow for deploying and testing a target intelligent decision model on an FPGA chip;

FIG. 8 is a schematic diagram showing the comparison of experimental results of the target intelligent decision model provided by the invention on different hardware platforms.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The features of the invention "first", "second" and the like in the description and in the claims may be used for the explicit or implicit inclusion of one or more such features. In the description of the invention, unless otherwise indicated, the meaning of "a plurality" is two or more. Furthermore, in the description and claims, "and/or" means at least one of the connected objects, and the character "/", generally means that the associated object is an "or" relationship.

Fig. 1 is a schematic diagram of an intelligent decision-making power acceleration system of an unmanned aerial vehicle, which is provided by the embodiment of the invention, and comprises an embedded heterogeneous hardware structure constructed based on a central processing unit (Central Processing Unit, CPU) 1 and a field programmable gate array (Field Programmable Gate Array, FPGA) chip 2, wherein the CPU1 and the FPGA chip 2 are interconnected based on a XDMA (DMA subsystem for PCI Express) system.

The XDMA system comprises a high-speed serial computer expansion bus standard (Peripheral Component Interconnect Express, PCIE) system 3 at the CPU1 side and a PCIE interface 21 corresponding to the PCIE system 3 on the FPGA chip 2, a first base address register space 31 and a second base address register space 32 are configured in the PCIE system 3, the first base address register space 31 is used for accessing a control register set corresponding to the FPGA chip 2, and the second base address register space 32 is used for accessing a data interface of the FPGA chip 2.

The FPGA chip 2 is provided with a target intelligent decision model.

The CPU1 is used for acquiring real-time state information of the unmanned aerial vehicle to be decided, and transmitting the real-time state information to a data interface of the FPGA chip 2 through the PCIE system 3.

The FPGA chip 2 is configured to invoke a target intelligent decision model based on a control register set, input real-time status information to the target intelligent decision model, select, by a target policy selector in the target intelligent decision model, an optimal target low-level policy from among the target low-level policies based on the real-time status information as a control policy, and return the control policy to the CPU1 through the PCIE interface 21.

The CPU1 is also used for controlling the actions of the unmanned aerial vehicle to be decided based on the control strategy.

Specifically, the unmanned aerial vehicle intelligent decision-making power acceleration system provided by the embodiment of the invention realizes a real-time calculation function in an unmanned aerial vehicle intelligent decision-making task through an embedded hardware structure. By developing the hardware logic interface module, the configuration optimization of the software and hardware architecture can be realized. Aiming at software and hardware architecture development driving software, the CPU1 performs flow control by reading and writing a control register set corresponding to the FPGA chip 2 through communication between the PCIE system 3 and the PCIE interface 21.

The CPU1 is good at processing tasks of control flow type, and has a strong computing power for small-scale signal processing parallel tasks by integrating a dedicated vector operation unit. The FPGA chip 2 excels in the video stream processing problem with local correlation, and has rich interfaces and ultra-low latency.

The CPU1 may be a central processing unit (Performance Optimization With Enhanced RISC-Performance Computing, powerPC) of Reduced Instruction Set (RISC) architecture, sometimes referred to as PPC. The CPU1 may be a processor system based on an MPC8641D chip for providing data input and control functions.

Two base address register (Base Address Register, BAR) spaces, a first base address register space (BAR 0) 31 and a second base address register space (BAR 2) 32, respectively, may be configured in PCIE system 3. By accessing the first base address register space, a control register group corresponding to the FPGA chip 2 can be accessed; by accessing the second base address register space, the data interface of the FPGA chip 2 can be accessed.

The XDMA system can convert PCIE bus transaction into AXI bus transaction, thereby completing data exchange with the internal hardware acceleration logic, and under the support of XDMA IP, the FPGA chip 2 is used as PCIE end equipment and is hung in the CPU 1.

The XDMA system can provide an AXI4 interface or an AXI4-Stream interface selectable by a user, can be added to system bus interconnection, and is suitable for asynchronous transmission of large data volume. The XDMA system also completes the transmission task appointed by the linked list in turn according to the head address of the linked list structure, and mainly comprises a configuration read-write interface, a data read-write interface logic, an interrupt processing module and a PCIE protocol processing module.

The configuration read-write interface supports register access in the IP by the user logic through the AXI-Lite bus interface, and simultaneously supports direct PCIe memory access of the HOST HOST end, so that transparent transmission from HOST to the user logic can be provided. The data read-write interface logic supports AXI4/AXI4-Stream protocol access, and the access from the host side to the user logic is completed through an H2C channel, and the access from the user logic to the host side is completed through a C2H channel. The interrupt processing module receives an interrupt signal generated by user logic and sends the interrupt signal to the host end through the PCIE protocol module, and supports legacy interrupt and MSI interrupt. The PCIE protocol processing module is responsible for forwarding PCIE protocols to generate relevant control signals.

FPGA chip 2 may be VCU118 and FPGA chip 2 may be deployed on an FPGA development board. Thus, the embedded heterogeneous acceleration hardware structure of PowerPC+VCU 118 can be constituted by CPU1 and FPGA chip 2.

The FPGA chip 2 may include programmable input-output units, programmable logic units, complete clock management, static Random-Access Memory (SRAM), abundant wiring resources, embedded underlying functional units, embedded dedicated hardware modules, control register sets, and data interfaces.

Wherein the programmable logic unit may be implemented based on an application layer gateway (Application Layer Gateway, ALG). The control register set may include a control register (CFG_CTRL_REG), a STATUS register (CFG_STATUS_REG), an input data register (CFG_DIN_REG0-7), and an output data register (CFG_DOUT_REG0-3). The base address of the control register set may be 0xA4000000, the offset address of the input data register may be 0x100 to 0x11c, the offset address of the control register may be 0x0, the offset address of the status register may be 0x4, and the offset address of the output data register may be 0x200 to 0x20c.

The data interface may be used to access the corresponding memory space of the FPGA chip 2. The storage space corresponding to the FPGA chip 2 may include an internal Data storage space and an external Data storage space, the internal Data storage space may be an SRAM on the FPGA chip 2, and the external Data storage space may be a Double Data Rate (DDR) corresponding to the FPGA chip.

It will be appreciated that SRAM is a memory space located on FPGA chip 2 and DDR is a memory space located outside FPGA chip 2. The data sharing between the FPGA chip 2 and the CPU1 can be realized through SRAM and DDR.

Here, the target intelligent decision model may be deployed and configured on the FPGA chip 2, and the input and output results are stored using a control register set corresponding to the FPGA chip 2.

The unmanned aerial vehicle to be decided can be any unmanned aerial vehicle in a fight scene. The unmanned aerial vehicle intelligent body to be decided is a virtual artificial intelligent system for controlling the movement of the unmanned aerial vehicle to be decided to realize the battle of both parties.

The real-time state information of the unmanned aerial vehicle to be decided may include aerodynamic information including a sailing speed and a sailing acceleration of the unmanned aerial vehicle to be decided, position information including three-dimensional position coordinates in the motion state information, and posture information including a track pitch angle, a track yaw angle, and an angle change rate of the unmanned aerial vehicle to be decided.

The FPGA chip 2 may invoke the target intelligent decision model using the control register set after receiving the real-time status information. The target intelligent decision model may include a target policy selector and target low-level policies. The target low-level strategies may include a target control zone strategy, a target aggressive shooting strategy, and a target defensive shooting strategy.

The target control area strategy is used for enabling the unmanned aerial vehicle to be decided to try to obtain a pursuit position behind the other unmanned aerial vehicle which is in contrast with the target control area strategy and occupy an area of a state space, so that the other unmanned aerial vehicle is unlikely to escape from the pursuit of the unmanned aerial vehicle to be decided.

The target aggressive shooting strategy is to encourage the unmanned aerial vehicle to be decided to attack the other unmanned aerial vehicle from the side and the front, and the range of the missile shooting reward is larger at a closer distance. Thus, target aggressive firing strategies are typically adopted that produce the greatest injury, but are susceptible to counterattack firing by the opposing drone agent. In terms of defense, aggressive firing strategies need to avoid shooting closer than shooting farther, making it a relatively less aggressive evade.

The target defensive shooting strategy refers to that the unmanned aerial vehicle agent to be decided performs equal evaluation on the missile shooting at a short distance and a long distance, so that the action of effectively keeping the position of the attack score is generated, even though the score amplitude may be low. Defensive shooting strategies need to avoid shooting equally from all distances, making them equally sensitive to all injuries, and are a relatively aggressive evasion.

The target strategy selector can be used for selecting an optimal target low-level strategy from all target low-level strategies according to the current participation environment as a control strategy, and can be positioned at the top layer of the hierarchical structure, and can periodically select the control strategy at a preset frequency, wherein the preset frequency can be 10Hz or other values.

The control strategy may be one of a target control zone strategy, a target offensiveness shooting strategy, and a target defensive shooting strategy, and is determined by a target strategy selector.

After that, the control register set corresponding to the FPGA chip 2 may return the control policy obtained by the target policy selector in the target intelligent decision model to the CPU1 through the data interface and through communication between the PCIE interface 21 and the PCIE system 3.

After receiving the control policy, the CPU1 may utilize the control policy to perform action control on the unmanned aerial vehicle to be decided, that is, utilize the control policy to control the unmanned aerial vehicle to be decided to implement action control on the unmanned aerial vehicle to be decided. The action control comprises the control of equipment such as ailerons, elevators, rudders, throttles and the like of the unmanned aerial vehicle to be decided.

The intelligent decision-making power acceleration system of the unmanned aerial vehicle provided by the embodiment of the invention comprises an embedded heterogeneous hardware structure constructed based on a central processing unit and an FPGA chip, wherein the central processing unit and the FPGA chip are interconnected based on an XDMA system; the XDMA system comprises a PCIE system positioned at one side of the central processor and a PCIE interface corresponding to the PCIE system on the FPGA chip, wherein a first base address register space and a second base address register space are configured in the PCIE system, the first base address register space is used for accessing a control register group corresponding to the FPGA chip, and the second base address register space is used for accessing a data interface of the FPGA chip; a target intelligent decision model is deployed on the FPGA chip; the central processing unit is used for acquiring real-time state information of the unmanned aerial vehicle to be decided and transmitting the real-time state information to a data interface of the FPGA chip through the PCIE system; the FPGA chip is used for calling a target intelligent decision model by utilizing the control register set, inputting real-time state information into the target intelligent decision model, selecting an optimal target low-level strategy from all target low-level strategies by a target strategy selector in the target intelligent decision model based on the real-time state information as a control strategy, and returning the control strategy to the central processor through the PCIE interface; the central processing unit is also used for controlling the actions of the unmanned aerial vehicle to be decided based on the control strategy. The system meets various types of aviation embedded computing tasks by means of an embedded heterogeneous hardware structure constructed by a central processing unit and an FPGA chip. According to the system, the advantages of the central processor and the FPGA chip are combined, tasks are decomposed and executed on the central processor and the FPGA chip, and the target intelligent decision model is deployed on the FPGA chip, so that the processing cost of the tasks can be reduced, the task processing efficiency is improved, the operation efficiency of the target intelligent decision model is improved, and the intelligent decision efficiency of the unmanned aerial vehicle is further improved. Moreover, the embedded heterogeneous hardware structure of the system can realize high-speed communication through the XDMA system, and the problem of transmission delay caused by task decomposition is reduced.

On the basis of the above embodiment, the control register set includes: a control register, a status register, an input data register, and an output data register;

the FPGA chip further comprises a programmable logic unit;

Specifically, the input data register in the control register set is used for storing real-time state information, the control register is used for indicating whether the programmable logic unit starts to call the target intelligent decision model for calculation, the state register is used for indicating whether the programmable logic unit finishes calculation, and the output data register is used for storing calculation results.

As shown in fig. 2, the programmable logic unit may first write real-time status information to the input data register.

Then, the programmable logic unit may write 1 at the start (start) bit1 of the control register, call the target intelligent decision model for calculation, and write the calculation result into the output data register.

Thereafter, the programmable logic unit may read a calculation status (busy) bit of the status register, which is used to indicate whether the programmable logic unit has completed a calculation.

When the calculation status bit is cleared, the programmable logic unit is indicated to finish calculation, and the calculation result in the output data register is a control strategy; and when the calculation state bit is not clear, the programmable logic unit is not complete in calculation, and the calculation state bit of the state register is continuously read until the calculation state bit is clear.

Finally, when the calculation status bit is cleared, the programmable logic unit can obtain the control strategy by reading the output data register and output the control strategy.

On the basis of the embodiment, the IP interface of the target intelligent decision model comprises a control interface, an input signal interface, an output signal interface and an output state interface;

Specifically, as shown in fig. 3, the IP interfaces of the target intelligent decision model include a control interface (ap_ctrl), an input signal interface, an output signal interface, and an output state interface.

The control interface may be adapted to receive a start signal (ap_start) and to output an end indication signal (ap_done) and an IP status signal (ap_idle). The starting signal is used for indicating the target intelligent decision model to start calculation, the ending indicating signal is used for indicating the target intelligent decision model to finish calculation, and the IP state signal is used for indicating whether the target intelligent decision model is normally called.

The input signal interfaces may include input1[63:0] -input8[63:0], each of which may input 64 bits. The input signal interface may be used to access real-time status information.

The output state interface may include an output1_ap_vld-output4_ap_vld. The output state interface may be used to output the valid state of the control strategy.

The output signal interfaces may include output1[63:0] -output4[63:0], each of which may output 64 bits. The output signal interface may be used to latch and output the control strategy when the control interface outputs an end indication signal and the active state is active.

In addition, the IP interface may also include a clock interface (ap_clk) and a reset interface (ap_rst). The clock of the clock interface is 300MHz.

Thus, the working process of the target intelligent decision model may include: accessing real-time state information- > inserting a starting signal- > waiting for an ending indication signal- > latching and outputting a control strategy.

The deployment structure of the target intelligent decision model on the FPGA chip 2 is shown in fig. 4, and the FPGA chip 2 includes a clock Buffer (BUFGCE), a first cross-Clock Domain (CDC), a second cross-Clock Domain (CDC), and an axis data generation (axis data gen) unit.

The clock buffer may receive a 300MHz user clock signal (user_clk) and obtain a 300MHz buffered signal (alg_clk). The buffer signals are input to the first clock domain crossing and the shaft data generating unit, respectively.

The XDMA system may transmit the real-time status information received by the CPU1 to the data interface of the FPGA chip 2, so as to store the real-time status information in the storage space corresponding to the FPGA chip 2. Before passing through the XDMA system, the clock signal frequency of the real-time state information is 100MHz, and after passing through the XDMA system, the clock signal frequency of the real-time state information is changed into 250MHz.

After passing through the XDMA system, real-time state information, a 250MHz external clock signal and a buffer signal are input into a first clock crossing domain, the output of the first clock crossing domain and the buffer signal are input into an axle data generating unit, 400MHz data is output by the axle data generating unit as the input of a target intelligent decision model, the target intelligent decision model outputs a 400MHz control strategy, the control strategy obtains a 250MHz control strategy through a second clock crossing domain, and the control strategy is stored in a storage space corresponding to the FPGA chip 2.

On the basis of the embodiment, the CPU1 is further configured to obtain, in a simulation fight scenario of the two unmanned aerial vehicle intelligent agents, motion state information of the two unmanned aerial vehicle models in a simulation fight process, perform injury calculation and win-lose judgment on the simulation fight process of the two unmanned aerial vehicle models, construct a training data set, and transmit the training data set to the FPGA chip through the PCIE system;

the FPGA chip 2 is also used for carrying out layered training on a strategy selector and each low-level strategy in the initial intelligent decision model by adopting a target depth Q network based on the training data set to obtain a target intelligent decision model;

Specifically, the CPU1 may obtain, in a simulation fight scenario of the two unmanned aerial vehicle agents, motion state information of the two unmanned aerial vehicle models in a simulation fight process, perform injury calculation and win-lose judgment on the simulation fight process of the two unmanned aerial vehicle models, construct a training data set, and transmit the training data set to the FPGA chip 2 through the PCIE system 3.

In the simulation fight scene, two parties are fighted, the two parties have a defending matrix which needs to be protected from being broken through by an enemy plane and an unmanned plane model for interception and attack, and the two red and blue parties have completely symmetrical fight conditions and task targets. In the embodiment of the invention, the red party can be used as the own party and the blue party can be used as the other party.

The target of the fight appointed by the unmanned aerial vehicle intelligent body of both sides is to invade the defense area of the other side or to hit down the unmanned aerial vehicle model of the other side, when one of the two sides completes the task, the simulated fight is ended and the winning or losing result is generated.

It can be understood that in the simulated combat scene, scene simulation setting can be performed and three-dimensional situation information of the current combat and flight state information of the unmanned aerial vehicle model can be displayed.

The unmanned aerial vehicle model can be a multi-degree-of-freedom unmanned aerial vehicle motion model and is used for simulating a real unmanned aerial vehicle. The unmanned aerial vehicle model of each party can be controlled by the unmanned aerial vehicle intelligent body of the party. The unmanned aerial vehicle intelligent body is a virtual artificial intelligent system for controlling the movement of the unmanned aerial vehicle model to realize the simulation combat of both sides.

Here, the unmanned aerial vehicle model may be controlled using a control scheme of the target waypoint. After receiving the related information of the target waypoints, the unmanned aerial vehicle intelligent body can automatically make path planning according to the position coordinates and the current position coordinates of the target waypoints and the body gesture, and the unmanned aerial vehicle intelligent body can automatically control the unmanned aerial vehicle model to fly along the target flight path.

In the embodiment of the invention, the unmanned aerial vehicle model can be an unmanned aerial vehicle motion model with six degrees of freedom, wherein the six degrees of freedom are navigation speed, track pitch angle, track yaw angle and three-dimensional position coordinates respectively. Thus, the motion state information of the two-party unmanned aerial vehicle model in the simulation fight process can comprise aerodynamic information, position information and gesture information, wherein the aerodynamic information comprises the navigation speed and the navigation acceleration of the two-party unmanned aerial vehicle model, the position information comprises three-dimensional position coordinates in the motion state information, and the gesture information comprises the track pitch angle, the track yaw angle and the angle change rate of the two-party unmanned aerial vehicle model.

In the simulation fight environment, the unmanned aerial vehicle intelligent body can acquire the motion state information of the own unmanned aerial vehicle model. Furthermore, the motion state information of the unmanned aerial vehicle models of both sides are transparent each other, namely, the motion state information of the unmanned aerial vehicle model of one side can be obtained by the unmanned aerial vehicle intelligent body of the other side.

And when injury calculation and win-win judgment are carried out on the simulation fight process of the unmanned aerial vehicle models of the two parties, the missile attack range of the unmanned aerial vehicle intelligent body needs to be configured. The missile attack range can be a three-dimensional area surrounded by a conical surface and a spherical surface, the central axis of the conical surface coincides with the head direction of the unmanned aerial vehicle model, the angle of the conical surface can be 80 degrees, and the radius of the spherical surface can be 500 meters. The angle of the conical surface and the radius of the spherical surface are equivalent to respectively limiting the maximum shooting inclination angle and the maximum attack limit distance of the unmanned aerial vehicle model.

Fig. 5 is a schematic diagram of the missile attack range of the unmanned aerial vehicle. In fig. 5, 4 is a own unmanned aerial vehicle model, and 5 is a counterpart unmanned aerial vehicle model.

When injury calculation is performed, the injury value can be represented by the reduced blood volume of the unmanned aerial vehicle intelligent agents. When the other unmanned aerial vehicle model is located in the missile attack range of the own unmanned aerial vehicle intelligent body, the own unmanned aerial vehicle intelligent body is considered to attack the other unmanned aerial vehicle intelligent body, and the blood volume of the other unmanned aerial vehicle intelligent body is reduced. The blood volume reduction rate may be set as needed, or may be set as a constant value, or may be adaptively set according to the duration of the attack, which is not particularly limited herein.

When the winning or losing judgment is carried out, if the blood volume of the unmanned aerial vehicle intelligent body of the opposite party is 0, determining own winning; if the blood volume of the own unmanned aerial vehicle intelligent body is 0, determining that the other party fails; and if the maximum combat duration is reached, determining that the two parties are in tie. The maximum combat duration may be set as desired, for example, t=300 s.

Based on the motion state information, the injury calculation result, the win-lose judgment result and other data in a complete simulation fight process, a training data set can be constructed and obtained and used for providing the FPGA chip 2 with training of an initial intelligent decision model. It will be appreciated that the motion state information is determined without analog sensor noise.

The FPGA chip 2 may receive the training data set and perform hierarchical training on the policy selector and each low-level policy in the initial intelligent decision model by using a Deep Q Network (DQN) to obtain a target intelligent decision model.

The low-level strategies comprise a control area strategy, an aggressive shooting strategy and a defensive shooting strategy, the target control area strategy can be obtained by training the control area strategy, the target aggressive shooting strategy can be obtained by training the aggressive shooting strategy, and the target defensive shooting strategy can be obtained by training the defensive shooting strategy.

In the embodiment of the invention, when the strategy selector is trained, parameters of each target low-level strategy are required to be frozen, so that the strategy selector is trained without other complex conditions except for each target low-level strategy, the learning problem is simplified, and the self unmanned aerial vehicle intelligent body can be trained and reused in a modularized mode.

When the strategy selector is trained, the adopted reward function is sparse and can comprise external rewards, wherein the external rewards refer to rewards given by participating environments under various low-level strategies, the external rewards can be determined based on the position of the own unmanned aerial vehicle intelligent body relative to the other unmanned aerial vehicle intelligent body, and the external rewards aim at positioning the other unmanned aerial vehicle model in a weapon exchange battle zone of the own unmanned aerial vehicle intelligent body.

The target depth Q network adopted in the process of training the strategy selector and each low-level strategy in the initial intelligent decision model can use a neural network to replace Q-learning to calculate the Q value through table lookup, approximation of a cost function is carried out through the neural network, the problem of calculating the Q value when the dimension of the input motion state information is very high is solved, meanwhile, two networks are adopted to reduce the dependency relationship between the target Q value calculation and the parameters of the Q network needing updating, and the problem that an algorithm is not easy to converge is solved.

The target deep Q network may include an evaluation network for calculating the Q value evaluation value, updating the structural parameter, and a target network for calculating the target Q value, the structures of the evaluation network and the target network remaining identical. The structural parameters of the target network do not need to be updated iteratively, but the structural parameters of the evaluation network are copied at intervals, so that the correlation between the target Q value and the Q value evaluation value is reduced.

In order to more accurately estimate the Q value, a competition network structure is added into an evaluation network and a target network, and the value of the state is judged under the condition that the action of the target depth Q network based on the competition network structure does not influence the environment, so that the optimal low-level strategy for the two-party fight is searched.

The competing network structure may include a state value network and an action-dominance network connected to the hidden layer. Fig. 6 is a schematic diagram of a competing network structure, as shown in fig. 6, which may include an input layer, a hidden layer, a state value network and an action-dominance network connected to the hidden layer, and an output layer. The unmanned plane intelligent agent acquires motion state information from a participation environment in a maneuvering process as input of an evaluation network, the characteristics obtained after the hidden layer are respectively input into a state value network and an action advantage network for further data processing, then the outputs of the state value network and the action advantage network are added, and finally the Q value is output.

In the embodiment of the invention, the training process does not need to establish a rule base, so that the obtained target intelligent decision model has strong universality, robustness and high accuracy, has high success rate for unmanned aerial vehicle decision, and can avoid the influence of subjectivity and experience on unmanned aerial vehicle decision. The low-level strategies and the strategy selector adopted in the training process are trained in a layered manner, so that the learning period is shortened, and complex maneuvers can be dealt with. In addition, the evaluation network and the target network of the target depth Q network adopted in the training process both comprise a competition network structure, and the competition network structure comprises a state value network and an action advantage network which are connected with the hidden layer, so that the accurate estimation of the Q value can be realized, and the accuracy of the target intelligent decision model is further improved. Moreover, by matching the CPU1 with the FPGA chip 2, the deployment and collaborative optimization of the target intelligent decision model can be completed under the condition of not losing the precision.

On the basis of the above embodiment, an AXI-Lite (Advanced eXtensible Interface Lite) slave interface is configured on the FPGA chip 2. The AXI-Lite slave interface can be used as an interface for providing logic for a user, is connected with a control register set, and is used for setting transmission transactions between different channels to be in a mutual noninterference mode, so that the transmission can be performed simultaneously, and setting different read-write channel IDs can ensure that the upper limit of independent transmission in the same channel reaches a corresponding value, thereby improving the transmission throughput rate.

The CPU1 and the AXI-Lite slave interfaces can be connected through an AXI-Lite bus, and the CPU1 can send an access request to the AXI-Lite slave interfaces by using the first base address register space. The AXI-Lite bus is a part of an AXI bus protocol, is an on-chip bus facing high performance, high bandwidth and low delay, has separated address/control and data phase, supports misaligned data transmission, only needs a first address in burst transmission, simultaneously separates read-write data channels, supports outlining transmission access and out-of-order access, and is easier to perform timing convergence.

The FPGA chip 2 may receive an access request through the AXI-Lite slave interface, and convert the access request into a read-write request for controlling the register set, and access the control register set by using the read-write request. Here, the AXI-Lite slave interface may act as a read-first register access interface.

Based on the above embodiment, the data interface of the FPGA chip 2 may be an AXI interface, which is used to access the static random access memory on the FPGA chip and the corresponding DDR controller of the FPGA chip through the AXI bus.

On the basis of the embodiment, the DDR controller may include a control module and a physical interface, where the control module may receive transmission data of the AXI bus and generate a read-write access request, and use the transmission data to perform read-write access on the DDR by using the physical interface.

The control module can be responsible for the time sequence parameter management and refreshing of DDR, and combines read-write requests on the bus to reduce redundant bus access, and can reorder commands to improve the utilization rate of the DDR data bus.

Here, the DDR may be DDR4 SDRAM (Synchronous Dynamic Random-Access Memory).

The physical interface may provide a high-speed access interface to the DDR, including hard and soft cores and their necessary calibration logic inside the FPGA chip 2. The calibration logic can ensure the accuracy of the timing parameters of the interface hard core.

Based on the above embodiments, the target intelligent decision model may be generated in a High-level Synthesis (HLS) manner.

On the basis of the above embodiment, the CPU1 may be configured with a user interface, similar to a first-in first-out (First In First Out, FIFO) interface, in which requests and data are responded to in sequence, the user interface logic buffering the data of the original controller interface and providing it to the user logic of the FPGA chip 2 after appropriate reordering.

The CPU1 may acquire the real-time status information through the user interface, and transmit the real-time status information to the FPGA chip 2 through the PCIE system 3.

On the basis of the embodiment, the embedded heterogeneous hardware structure is verified through register conversion level circuit (Register Transfer Level, RTL) level simulation, and the target intelligent decision model after layering training is tested, so that performance evaluation of the target intelligent decision model is achieved, and the acceleration effect of the target intelligent decision model in the embedded heterogeneous environment is verified.

As shown in fig. 7, the target intelligent decision model is deployed and tested on the FPGA chip 2, and the test code flow mainly comprises the following steps:

step 1: starting timing, and returning a timestamp before starting calculation by adopting a clock function.

Step 2: and determining structural parameters of the target intelligent decision model, calculating through the target intelligent decision model, directly calling a network function, wherein the number of input parameters is 8, the number of output parameters is 4, and directly calling the network function.

Step 3: the timer was stopped after 1000000 cycles.

Step 4: and calling a clock function to acquire the timestamp after the calculation is completed.

Step 5: stopping timing and outputting the calculation result.

The correctness of the embedded heterogeneous hardware structure is verified through RTL simulation, a waveform file and a test vector are established by utilizing a waveform editor, HDL and the like before simulation, a simulation result can generate a report file and an output signal waveform, and the change of each node signal is observed. The target intelligent decision model generates excitation data in C++ as input of the embedded heterogeneous hardware structure, then calculates the result of the embedded heterogeneous hardware structure, calculates the target intelligent decision model, compares the result of the embedded heterogeneous hardware structure with the result of the target intelligent decision model, and verifies whether the embedded heterogeneous hardware structure is correct.

The IMX6, nvidia Xavier and RISC-V Umatrix hardware platforms are used as a comparison group, the system environment of code operation is Linux, and a test result is compared with the acceleration of an FPGA algorithm. The PowerPC+VCU118 serves as an accelerator, the computation logic is deployed on an FPGA development board of the VCU118, and the PowerPC serves as a master control CPU to provide data input and control functions. FIG. 8 is a comparison of experimental results of a target intelligent decision model on different hardware platforms. The target intelligent decision model can be compiled on a PowerPC+VCU118 hardware platform, and compared with the efficiency of the target intelligent decision model after hardware acceleration on hardware platforms such as IMX6, nvidia Xavier and RISC-V Umatrix, the efficiency of the target intelligent decision model is improved, and the running time is shortened to 7.283s.

The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. The intelligent decision-making power accelerating system of the unmanned aerial vehicle is characterized by comprising an embedded heterogeneous hardware structure constructed based on a central processing unit and an FPGA chip, wherein the central processing unit and the FPGA chip are interconnected based on an XDMA system;

a target intelligent decision model is deployed on the FPGA chip;

2. The unmanned aerial vehicle intelligent decision-making power acceleration system of claim 1, wherein the control register set comprises: a control register, a status register, an input data register, and an output data register;

the FPGA chip further comprises a programmable logic unit;

3. The unmanned aerial vehicle intelligent decision-making power acceleration system of claim 1, wherein the IP interfaces of the target intelligent decision-making model comprise a control interface, an input signal interface, an output signal interface, and an output status interface;

4. The intelligent decision-making power acceleration system of claim 1, wherein the central processor is further configured to obtain motion state information of the two-party unmanned aerial vehicle model in a simulated combat process in a simulated combat scene of the two-party unmanned aerial vehicle, perform injury calculation and win-lose judgment on the simulated combat process of the two-party unmanned aerial vehicle model, construct a training dataset, and transmit the training dataset to the FPGA chip through the PCIE system;

5. The unmanned aerial vehicle intelligent decision-making power acceleration system of claim 4, wherein the embedded heterogeneous hardware architecture is tested against the target intelligent decision model after hierarchical training through register conversion level circuit level simulation verification.

6. The unmanned aerial vehicle intelligent decision-making power acceleration system of any one of claims 1-5, wherein an AXI-Lite slave interface is configured on the FPGA chip;

7. The unmanned aerial vehicle intelligent decision-making power acceleration system of any one of claims 1-5, wherein the data interface is an AXI interface for accessing a static random access memory on the FPGA chip and a corresponding DDR controller of the FPGA chip over an AXI bus.

8. The unmanned aerial vehicle intelligent decision-making power acceleration system of claim 7, wherein the DDR controller comprises a control module and a physical interface;

9. The unmanned aerial vehicle intelligent decision-making force acceleration system of any one of claims 1-5, wherein the target intelligent decision model is generated in a high-level synthesis manner.

10. The unmanned aerial vehicle intelligent decision-making power acceleration system of any one of claims 1-5, wherein the central processor is configured with a user interface;