CN113381824B

CN113381824B - Underwater acoustic channel measuring method and device, unmanned underwater vehicle and storage medium

Info

Publication number: CN113381824B
Application number: CN202110639526.4A
Authority: CN
Inventors: 任勇; 夏照越; 杜军; 王景璟; 李刚
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2021-06-08
Filing date: 2021-06-08
Publication date: 2023-01-31
Anticipated expiration: 2041-06-08
Also published as: CN113381824A

Abstract

The application relates to an underwater acoustic channel measuring method and device, an unmanned underwater vehicle and a storage medium. The method comprises the following steps: acquiring ocean parameters and local state information of a submersible vehicle; inputting ocean parameters, local state information of an underwater vehicle, a current underwater vehicle and underwater acoustic channel measurement data measured in a previous round of the underwater vehicle in a preset range into a pre-trained multi-agent reinforcement learning model to obtain an underwater acoustic channel measurement strategy in the current round; and performing underwater acoustic channel measurement according to the underwater acoustic channel measurement strategy to obtain the underwater acoustic channel measurement data of the round. According to the scheme, the underwater acoustic channel measurement strategy of the round can be immediately obtained only by inputting real-time local environment information into the multi-agent reinforcement learning model, compared with a fixed transceiving communication system, the underwater acoustic channel measurement strategy has the advantages that transmission delay is reduced, the measurement efficiency is improved, the influence caused by channel attenuation is reduced, the space coverage capacity of underwater acoustic channel characteristic collection is improved, the control flexibility of space coverage is enhanced, and the measurement accuracy is high.

Description

Underwater acoustic channel measuring method and device, unmanned underwater vehicle and storage medium

Technical Field

The application relates to the technical field of underwater acoustic channel measurement, in particular to an underwater acoustic channel measurement method and device, an unmanned underwater vehicle and a storage medium.

Background

Under the strategic background of the national vigorous development of marine resources and the development of marine economy, the important concept of smart oceans is receiving wide attention from academic circles. The underwater acoustic communication is an essential key technology for realizing the remote transmission of underwater information and supporting an ocean information system and the application of intelligent ocean and intelligent marine defense.

At present, a fixed transceiving communication system is mainly adopted for underwater acoustic channel measurement, the fixed transceiving communication system is realized by fixing or suspending a signal source and a hydrophone through a ship, and the characteristics of time-frequency interference, doppler frequency shift, transmission delay, channel attenuation and the like of underwater acoustic channel transmission are estimated through analyzing received signals.

However, the current underwater acoustic channel measurement has the problem of low real-time measurement accuracy.

Disclosure of Invention

In view of the above, it is necessary to provide an underwater acoustic channel measurement method, an apparatus, an unmanned underwater vehicle, and a storage medium, which can improve the real-time measurement efficiency and accuracy of an underwater acoustic channel.

In a first aspect, a method for measuring an underwater acoustic channel is provided, the method including:

acquiring ocean parameters and local state information of an underwater vehicle, wherein the local state information comprises position information of the current underwater vehicle and state information of the underwater vehicle in a preset range;

inputting ocean parameters, local state information of the underwater vehicle, the current underwater vehicle and underwater acoustic channel measurement data measured in the previous round of the underwater vehicle in a preset range into a pre-trained multi-agent reinforcement learning model to obtain an underwater acoustic channel measurement strategy of the current round;

and performing underwater acoustic channel measurement according to the underwater acoustic channel measurement strategy to obtain the underwater acoustic channel measurement data of the round.

In one embodiment, after performing underwater acoustic channel measurement according to an underwater acoustic channel measurement strategy to obtain the underwater acoustic channel measurement data of the current round, the method further includes:

when the underwater acoustic channel measurement data of the current wheel meet preset effective conditions, current ocean parameters and local state information of all the underwater vehicles are obtained;

and training the multi-agent reinforcement learning model according to the current ocean parameters, the current local state information of each underwater vehicle and the underwater acoustic channel measurement strategy of the current round.

In one embodiment, the training process of the multi-agent reinforcement learning model comprises the following steps:

obtaining sample ocean parameters, sample state information of a submersible vehicle and a previous round of sample underwater acoustic channel measurement strategy;

inputting sample ocean parameters, sample state information of the underwater vehicle and a previous sample underwater acoustic channel measurement strategy into a sample strategy function to obtain a current sample measurement strategy;

performing underwater acoustic channel measurement according to a sample measurement strategy to obtain sample underwater acoustic channel measurement data of the current round;

calculating the measurement data of the sample underwater acoustic channel and a reference result to obtain an award value corresponding to a sample measurement strategy;

and updating the sample strategy function according to the reward value until the difference value between the reward value of the previous round and the reward value of the current round is smaller than a preset threshold value, so as to obtain the multi-agent reinforcement learning model.

In one embodiment, the previous round of sample underwater acoustic channel measurement strategy comprises the previous round of sample underwater acoustic channel measurement strategies of all the submergers;

the method for obtaining the sample ocean parameters, the sample state information of the underwater vehicle and the previous round of sample underwater acoustic channel measurement strategy comprises the following steps:

obtaining sample ocean parameters, sample state information of an underwater vehicle and a sample underwater acoustic channel measurement strategy of a previous round of current underwater vehicle;

and obtaining corresponding sample underwater acoustic channel measurement strategies according to respective strategy functions of other submergible devices.

In one embodiment, obtaining the corresponding sample underwater acoustic channel measurement strategy according to the respective strategy functions of the other submersible vehicles includes:

training respective strategy functions of other underwater vehicles according to the historical sample underwater acoustic channel measurement strategies of other underwater vehicles and the reference result;

and when the error between the sample underwater acoustic channel measurement data obtained through the respective strategy functions of the other underwater vehicles and the reference result is smaller than a preset error threshold value, obtaining the strategy functions of the other underwater vehicles.

In one embodiment, updating the sample policy function according to the reward value includes:

and updating the sample strategy function according to the reward value, the sample underwater acoustic channel measurement strategy of the current underwater vehicle in the previous round and the sample underwater acoustic channel measurement strategies of other underwater vehicles.

for each underwater vehicle, randomly sampling from a strategy function set of the underwater vehicle, and determining the integral reward of the strategy function set when each sub-strategy function is obtained;

if the overall reward is larger than a preset overall reward condition, determining the currently sampled sub-strategy function as a sample strategy function of the underwater vehicle;

and if the overall reward is less than or equal to the preset overall reward condition, updating the currently sampled sub-strategy function according to the updating gradient of the currently sampled sub-strategy function, and continuing to perform random sampling.

In a second aspect, there is provided an underwater acoustic channel measurement apparatus, the apparatus comprising:

the acquisition module is used for acquiring ocean parameters and local state information of the underwater vehicle, wherein the local state information comprises position information of the current underwater vehicle and state information of the underwater vehicle in a preset range;

the decision-making module is used for inputting ocean parameters, local state information of the underwater vehicle, the current underwater vehicle and underwater acoustic channel measurement data measured in the previous round of the underwater vehicle in a preset range into a pre-trained multi-agent reinforcement learning model to obtain an underwater acoustic channel measurement strategy in the current round;

and the measuring module is used for carrying out underwater acoustic channel measurement according to the underwater acoustic channel measurement strategy to obtain the underwater acoustic channel measurement data of the current round.

In a third aspect, a submarine vehicle is provided, which includes a memory and a processor, the memory stores a computer program, and the processor executes the computer program to implement the following steps:

acquiring ocean parameters and local state information of the underwater vehicle, wherein the local state information comprises position information of the current underwater vehicle and state information of the underwater vehicle in a preset range;

and performing underwater acoustic channel measurement according to the underwater acoustic channel measurement strategy to obtain the underwater acoustic channel measurement data of the current round.

In a fourth aspect, there is provided a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of:

inputting ocean parameters, local state information of an underwater vehicle, a current underwater vehicle and underwater acoustic channel measurement data measured in a previous round of the underwater vehicle in a preset range into a pre-trained multi-agent reinforcement learning model to obtain an underwater acoustic channel measurement strategy in the current round;

According to the underwater acoustic channel measuring method and device, the unmanned underwater vehicle and the storage medium, the ocean parameters and the local state information of the underwater vehicle are obtained; inputting the ocean parameters, the local state information of the underwater vehicle, the current underwater vehicle and the underwater acoustic channel measurement data measured in the previous round of the underwater vehicle in the preset range into a pre-trained multi-agent reinforcement learning model to obtain the underwater acoustic channel measurement strategy of the current round; and performing underwater acoustic channel measurement according to the underwater acoustic channel measurement strategy to obtain the underwater acoustic channel measurement data of the current round. According to the scheme, the underwater acoustic channel measurement strategy of the round can be obtained immediately only by inputting real-time local environment information into the multi-agent reinforcement learning model, compared with a fixed transceiving communication system, the underwater acoustic channel measurement strategy has the advantages that transmission time delay is reduced, the measurement efficiency is improved, the influence caused by channel attenuation is reduced, the space coverage capacity of underwater acoustic channel characteristic collection is improved, the underwater acoustic channels are measured in a mobile manner by a plurality of unmanned underwater vehicles, the control flexibility of space coverage is enhanced, and therefore the measurement accuracy is high.

Drawings

FIG. 1 is a diagram of an embodiment of an underwater acoustic channel measurement method;

FIG. 2 is a flow diagram illustrating an embodiment of a method for underwater acoustic channel measurement;

FIG. 3 is a block diagram of an embodiment of an apparatus for measuring an underwater acoustic channel;

fig. 4 is an internal structural view of the unmanned submersible vehicle in one embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

Under the strategic background of the national vigorous development of marine resources and the development of marine economy, the important concept of smart oceans is receiving wide attention from academia. The underwater acoustic communication is an essential key technology for realizing the remote transmission of underwater information and supporting an ocean information system and the application of intelligent ocean and intelligent marine defense thereof. However, so far, the time-space change and evolution law of wide-area deep and far seawater acoustic channel characteristics do not have an accurate model and lack an actual measurement database, and become the most key bottleneck restricting underwater communication, networking and even detection.

At present, a fixed transceiving communication system is mainly adopted for underwater acoustic channel measurement, the fixed transceiving communication system is realized by fixing or suspending a signal source and a hydrophone through a ship, and the characteristics of time-frequency interference, doppler frequency shift, transmission delay, channel attenuation and the like of underwater acoustic channel transmission are estimated through analyzing received signals. This measurement method has four disadvantages: (1) The time coverage capability for channel feature acquisition is extremely limited. (2) The spatial coverage capability for channel feature acquisition is limited and there is a lack of flexibility in controlling the spatial coverage. (3) Mobility is limited, and Doppler characteristic analysis caused by mobility of the underwater communication node cannot be supported. And (4) the traditional measuring equipment is complex and has low measuring efficiency.

Multi-Agent Reinforcement Learning, also known as MARL (Multi-Agent Reinforcement Learning), is an important solution to the problem of Multi-Agent control in complex tasks. MARL can handle more complex multi-agent control problems in multi-agent cooperation or competition scenarios than normal reinforcement learning. Commonly used MARL techniques often combine Deep Learning (Deep Learning), reinforcement Learning (Reinforcement Learning), and Graph Representation (Graph repetition) theories. Based on the above theory, MARL can possess the advantages of the three simultaneously: 1) The strong feature extraction and characterization capability of deep learning can accurately extract environmental features and abstract expression environmental features, and the expressed result can be directly used for numerical processing; 2) The adaptability of reinforcement learning to the environment ensures that the real-time control of the multi-agent can flexibly adapt to the change of the environment; 3) The graph representation theory can abstract and express the cooperation and competition relationship among the multiple intelligent agents, and can provide a solid mathematical foundation for problem modeling based on a game or Markov decision process.

Therefore, the invention provides an underwater acoustic channel measuring method, which utilizes the strong adaptability of multi-agent reinforcement learning to the environment and the environment characterization capability to improve the efficiency and the accuracy of measuring the underwater acoustic channel.

The underwater acoustic channel measurement method provided by the application can be applied to the application environment shown in fig. 1. In an Underwater environment, as an Underwater acoustic channel has time-varying property and is influenced by obvious multipath effect and doppler effect, and the Underwater acoustic channel is extremely difficult to measure in real time, in the embodiment of the application, a plurality of Autonomous Underwater vehicles (unmanned Underwater vehicles) form an Autonomous Underwater Vehicle (AUV) cluster, and each AUV carries a hydrophone and a transmitter, so that each AUV can be used as a sending end to send signals and can also be used as a receiving end to receive signals, and the Underwater acoustic channel measurement is realized.

In one embodiment, as shown in fig. 2, there is provided an underwater acoustic channel measurement method, which is described by taking the method as an example applied to the unmanned underwater vehicle in fig. 1, and includes the following steps:

step 202, obtaining ocean parameters and local state information of the underwater vehicle.

Wherein, the ocean parameters refer to ocean related parameters such as temperature, salinity, flow velocity, density and the like of the current sea area. The underwater vehicle is mainly an unmanned underwater vehicle (AUV), the AUV is an instrument which is unmanned and navigated underwater by remote control or automatic control, and mainly refers to an intelligent system which replaces a diver or a manned small-sized submarine to carry out high-risk underwater operation such as deep sea detection, lifesaving, mine removal and the like. The state information of the underwater vehicle comprises the speed and the position of the AUV. The preset range refers to a range set manually, for example, the preset range may be a circular range with a current AUV as a center and a preset length as a radius, or may be a preset number of AUVs nearest to the current AUV, for example, 5 AUVs nearest to the current AUV are used as AUVs within the preset range, so as to prevent the situation that there are no other AUVs within the preset range when the peripheral AUV is far from the current AUV and is divided according to the circular range, and the specific preset range may be set according to actual measurement requirements of the underwater acoustic channel, which is not limited herein. The local state information comprises position information of the current underwater vehicle and state information of the underwater vehicle in a preset range, and also can comprise state information of the current underwater vehicle and state information of the underwater vehicle in the preset range, and also can comprise state information of all AUVs in the AUV cluster. The state information includes information such as the speed, the moving direction, and the communication state of the current AUV (which AUV communicates with, and is used as a receiving end or a transmitting end), and the relative position of the current AUV and other AUVs (i.e., the topological relation of the AUV cluster).

Specifically, the current underwater vehicle acquires ocean related parameters such as temperature, salinity, flow velocity and density of the current sea area, position information of the current underwater vehicle, position and speed information of other underwater vehicles in a preset range, communication state and the like according to a sensor arranged on the current underwater vehicle. In an optional implementation scheme, the current underwater vehicle can also acquire ocean related parameters such as temperature, salinity, flow velocity and density of the current sea area, position information and speed information of the current underwater vehicle, position and speed information of other underwater vehicles in a preset range, communication state and other information according to a sensor equipped in the current underwater vehicle. In another optional embodiment, the current underwater vehicle can also acquire sea-related parameters such as temperature, salinity, flow velocity, density and the like of the current sea area and information such as position and speed information and communication state of all the underwater vehicles in the AUV cluster according to sensors equipped in the current underwater vehicle.

The set of AUV clusters is

And N is the total number of AUVs in the AUV cluster. Respectively using

And

indicating the speed and position of the AUV. The embodiment of the application considers the AUV auxiliary mobile channel measurement planning problem under the three-dimensional coordinate system, thereby

And

step 204, inputting the ocean parameters, the local state information of the underwater vehicle, the current underwater vehicle and the underwater acoustic channel measurement data measured in the previous round of the underwater vehicle in the preset range into a pre-trained multi-agent reinforcement learning model to obtain the underwater acoustic channel measurement strategy of the round.

The underwater acoustic channel measurement strategy is mainly the action of the current AUV at the next moment, and comprises action information such as the position of the current AUV at the next moment, the average speed in the process of moving from the current position to the position of the next moment, and the AUV with which the next moment is communicated (as a receiving end or a transmitting end).

Specifically, the current underwater vehicle obtains the underwater acoustic channel measurement data measured in the previous round from a database of the current underwater vehicle, obtains the underwater acoustic channel measurement data measured in the previous round of other underwater vehicles in a preset range in a broadcasting mode, inputs the obtained marine parameters of the current sea area, the position information of the current underwater vehicle, the state information of the current underwater vehicle in the preset range, the current underwater vehicle and the underwater acoustic channel measurement data measured in the previous round of the underwater vehicle in the preset range into a pre-trained multi-agent reinforcement learning model, and outputs the underwater acoustic channel measurement strategy of the current underwater vehicle through a strategy function of the current underwater vehicle.

And step 206, performing underwater acoustic channel measurement according to the underwater acoustic channel measurement strategy to obtain the underwater acoustic channel measurement data of the current round.

The underwater acoustic channel measurement data mainly refers to channel parameters, and specifically may be parameters of channel frequency response.

Specifically, the current AUV executes a corresponding action according to the position of the next time, the speed when going to the position of the next time, and which AUV the next time communicates with (as a receiving end or a transmitting end) in the underwater acoustic channel measurement strategy of the current round, and performs the underwater acoustic channel measurement to obtain the underwater acoustic channel data of the current round measured by the current AUV.

In the underwater acoustic channel measurement method, ocean parameters and local state information of an underwater vehicle are obtained; inputting the ocean parameters, the local state information of the underwater vehicle, the current underwater vehicle and the underwater acoustic channel measurement data measured in the previous round of the underwater vehicle in the preset range into a pre-trained multi-agent reinforcement learning model to obtain the underwater acoustic channel measurement strategy of the current round; and performing underwater acoustic channel measurement according to the underwater acoustic channel measurement strategy to obtain the underwater acoustic channel measurement data of the current round. According to the scheme, the underwater acoustic channel measurement strategy of the round can be immediately obtained only by inputting real-time local environment information into the multi-agent reinforcement learning model, compared with a fixed transceiving communication system, the transmission time delay is reduced, the measurement efficiency is improved, the influence caused by channel attenuation is reduced, the space coverage capacity of underwater acoustic channel characteristic collection is improved, the underwater acoustic channels are measured in a mobile manner by a plurality of unmanned underwater vehicles, the control flexibility of space coverage is enhanced, and therefore the measurement accuracy is high.

In an optional embodiment, after performing the underwater acoustic channel measurement according to the underwater acoustic channel measurement policy to obtain the underwater acoustic channel measurement data of the current round, the method further includes: when the underwater acoustic channel measurement data of the current wheel meet preset effective conditions, current ocean parameters and local state information of all the underwater vehicles are obtained; and training the multi-agent reinforcement learning model according to the current ocean parameters, the current local state information of each underwater vehicle and the underwater acoustic channel measurement strategy of the current round.

The preset effective condition may be set according to a physical law of the measured parameters of the channel frequency response, for example, the time domain frequency response of the parameters of the channel frequency response has sparsity and the frequency domain frequency response of the parameters of the channel frequency response is in a mountain shape with a narrow left and a wide right.

Specifically, in the task execution stage of the AUV cluster, the underwater acoustic channel measurement data obtained by each round of measurement and the state information of each AUV in the AUV cluster can be used for updating parameters in the multi-agent reinforcement learning model in real time on line, so that the decision of the next round is more accurate. Firstly, judging whether the parameters of the channel frequency response obtained by the current measurement meet the preset effective conditions or not by the current underwater vehicle; and when the effective conditions are met, the current underwater vehicle acquires ocean parameters of the current sea area, local state information of other underwater vehicles in the current preset range and underwater acoustic channel measurement strategies of the underwater vehicles in the preset range to adjust the deep neural network parameters in the multi-agent reinforcement learning model.

Further, the trained multi-agent reinforcement learning model is

In the AUV cluster task execution stage, inputting the acquired ocean parameters of the current sea area, the position information of the current AUV, the state information of other AUVs in the preset range and the underwater acoustic channel measurement data of the previous round of the AUV in the preset range

In the method, the underwater acoustic channel measurement data of the round are obtained

Inputting ocean parameters of the current sea area, local state information of other underwater vehicles in the current preset range and underwater acoustic channel measurement strategies of the underwater vehicles in the preset range into the sea area

In the method, ocean parameters are updated so as to update the original parameters

The parameters of the deep neural network in the model are updated to obtain a new model

So as to make the next round of decision until the current underwater acoustic channel measurement task is finished. The specific condition for ending the underwater acoustic channel measurement task may be that the AUV cluster measures a water channel with a preset volume, or that the measured quantity reaches a preset quantity.

In an alternative embodiment, a process for training a multi-agent reinforcement learning model, comprising: obtaining sample ocean parameters, sample state information of a submersible vehicle and a previous round of sample underwater acoustic channel measurement strategy; inputting sample ocean parameters, sample state information of the underwater vehicle and a previous sample underwater acoustic channel measurement strategy into a sample strategy function to obtain a current sample measurement strategy; performing underwater acoustic channel measurement according to a sample measurement strategy to obtain sample underwater acoustic channel measurement data of the round; calculating the measurement data of the sample underwater acoustic channel and a reference result to obtain an award value corresponding to a sample measurement strategy; and updating the sample strategy function according to the reward value until the difference value between the reward value of the previous round and the reward value of the current round is smaller than a preset threshold value, and obtaining the multi-agent reinforcement learning model.

The sample state information in the training stage comprises the state information of all AUVs in the AUV cluster.

Specifically, in the training stage, the current underwater vehicle acquires sample ocean parameters such as temperature, salinity, flow velocity, density and the like of the current sea area, sample state information such as position and speed information and communication state of all the underwater vehicles and a previous round of sample underwater acoustic channel measurement strategy according to a sensor equipped in the current underwater vehicle. Inputting the sample ocean parameters, the sample state information of the underwater vehicle and the previous round of sample underwater acoustic channel measurement strategy into an initial sample strategy function of an action network of the MARL to obtain the current round of sample measurement strategy; and the current AUV carries out measurement according to the sample measurement output by the action network to obtain the sample underwater acoustic channel measurement data of the current round. Inputting the sample underwater acoustic channel measurement data, the measured marine parameters and the measured state information of the AUV cluster of the current round into an initial MARL evaluation network, calculating the reward value (namely the cumulative expected reward value) of the current round according to the sample underwater acoustic channel measurement data of the current round and the reference result of the sample set, updating the sample strategy function according to the reward value of the current round until the difference value between the reward value of the previous round and the reward value of the current round is smaller than a preset threshold value, and obtaining the multi-agent reinforcement learning model.

For example, by π = (π) ₁ ，π ₂ ，...，π _N ) ^T Underwater acoustic channel measurement strategy representing AUV cluster, theta = (theta) ₁ ，θ ₂ ，...，θ _N ) ^T Representing the corresponding strategy parameter, i.e. for any 1 < i.ltoreq.N, having π _i ＝π _i (θ _i ). For AUVi, a given policy parameter θ _i The cumulative expected reward function of which is expressed as

Wherein the content of the first and second substances,

representing a mathematical expectation, 0 < gamma ≦ 1 is a discount coefficient,

reward value (reward) of AUVi at time t, s represents state information of AUVi, a _i Indicating the operation state of the AUVi.

And during the first training, the sample underwater acoustic channel measurement strategy can be an initialized strategy, the sample underwater acoustic channel measurement strategy is continuously optimized in the subsequent training process, and when the preset threshold value is reached, the training is stopped, so that the trained multi-agent reinforcement learning model is obtained.

Furthermore, the underwater acoustic channel cooperative measurement strategy of the MARL-based AUV cluster adopts a centralized training and distributed execution working mode, so that when the neural network training is performed, state information of all AUVs and sample measurement of all AUVs need to be acquired, and thus the learned neural network parameters are more accurate. Only local state information and local underwater acoustic channel measurement data are needed to make a quick decision in the execution stage.

In an alternative embodiment, the previous round of sample underwater acoustic channel measurement strategy comprises the previous round of sample underwater acoustic channel measurement strategies of all the submergence vehicles; correspondingly, obtaining sample ocean parameters, sample state information of the underwater vehicle and a previous round of sample underwater acoustic channel measurement strategy comprises the following steps: obtaining sample ocean parameters, sample state information of a submarine vehicle and a sample underwater acoustic channel measurement strategy of a previous round of current submarine vehicle; and obtaining corresponding sample underwater acoustic channel measurement strategies according to respective strategy functions of other submergible devices.

Specifically, in the training stage, the previous sample underwater acoustic channel measurement strategies of all the underwater vehicles are obtained, and the previous sample underwater acoustic channel measurement strategies include the previous sample underwater acoustic channel measurement strategy of the current AUV and the previous sample underwater acoustic channel measurement strategies of other AUVs except the current AUV. And the current AUV obtains the previous round of sample underwater sound channel measurement strategy from the database of the current AUV, and the previous round of sample underwater sound channel measurement strategies of other AUVs are obtained through the strategy functions of other AUVs trained by the current AUV.

In this embodiment, the previous-round sample underwater acoustic channel measurement strategy of the other AUVs is obtained through the strategy functions of the other AUVs trained by the current AUV, and the previous-round sample underwater acoustic channel measurement strategy of the other AUVs is not required to be obtained by the current AUV through a global communication mode. The communication load of the AUV cluster can be greatly relieved, and the potential problem of network crash caused by high communication delay can be avoided.

In an alternative embodiment, updating the sample policy function according to the reward value includes: and updating the sample strategy function according to the reward value, the sample underwater acoustic channel measurement strategy of the current underwater vehicle in the previous round and the sample underwater acoustic channel measurement strategies of other underwater vehicles.

Specifically, the sample strategy function can be updated according to the reward value, the sample underwater acoustic channel measurement strategy of the current underwater vehicle in the previous round, the sample underwater acoustic channel measurement strategies of other underwater vehicles and the strategy gradient of the sample strategy function.

The strategy adopted by the embodiment of the application is a randomness strategy, the strategy pi is a distribution, and the strategy gradient is

Wherein o is _i An observation vector (i.e., a reference result) representing the AUVi with respect to the ambient environment (the position and motion state of the AUV within a preset range) and the underwater acoustic channel, x = (o) ₁ ，o ₂ ，...，o _N ) ^T An overall observation vector representing the AUV cluster, i.e. the AUV cluster state,

state-action functions (i.e., sample policy functions) in the centralized training process of AUVi are represented. Since each AUV learns itself locally

The functions, and thus each AUV, can have different reward functions, thereby facilitating collaborative mission planning.

Under the condition of centralized training, the training system can be used for training the training system,

is updated as

Wherein the content of the first and second substances,

denotes the target network, π 'is the target policy, θ' _k Are parameters of the target policy. The target network is a topological network formed by AUV clusters when the sample strategy function converges to an optimal value, the target strategy is an AUV strategy function when the sample strategy function converges to the optimal value, and the parameters of the target strategy are parameters of the AUV strategy function when the sample strategy function converges to the optimal value. The calculation of y in the updating function of the sample strategy function requires a sample underwater acoustic channel measurement strategy pi 'on one AUV except the current AUV' _k 。

In an optional embodiment, the obtaining of the corresponding sample underwater acoustic channel measurement strategy according to the respective strategy functions of the other underwater vehicles includes: training respective strategy functions of other underwater vehicles according to historical sample underwater acoustic channel measurement strategies of other underwater vehicles and the reference result; and when the error between the sample underwater acoustic channel measurement data obtained through the respective strategy functions of the other underwater vehicles and the reference result is smaller than a preset error threshold value, obtaining the strategy functions of the other underwater vehicles.

Specifically, the current AUV may obtain the strategic approximation functions of other AUVs, the strategic approximation functions of other AUVs are obtained by training according to the historical sample underwater acoustic channel measurement strategies of other underwater vehicles and the reference result, and the strategic approximation functions of other AUVs are stored in a local memory of the current AUV.

For example, each AUV has a policy approximation function of the other AUV

Strategy pi for representing AUVi versus AUVj _j The distribution of (a) approximates a function. Approximation performance is described by a logarithmic cost function, i.e.

Wherein H is information entropy. And (3) minimizing the cost function to obtain the strategic approximation functions of the other AUVs, namely obtaining the strategic approximation functions of the other underwater vehicles when the error between the sample underwater acoustic channel measurement data obtained through the respective strategic functions of the other underwater vehicles and the reference result is smaller than a preset error threshold. Thus, equation (4) can be re-expressed as

Can be realized by equation (2).

In an alternative embodiment, updating the sample policy function according to the reward value includes: for each underwater vehicle, randomly sampling from a strategy function set of the underwater vehicle, and determining the integral reward of the strategy function set when each sub-strategy function is obtained; if the overall reward is larger than a preset overall reward condition, determining the currently sampled sub-strategy function as a sample strategy function of the underwater vehicle; and if the overall reward is less than or equal to the preset overall reward condition, updating the currently sampled sub-strategy function according to the updating gradient of the currently sampled sub-strategy function, and continuing to perform random sampling.

Specifically, since the MARL is a dynamic planning problem in decision making, and the solution is dynamic, in the actual execution process, the policy function of each AUV is iterated continuously, and the state of the global AUV cluster is affected, so that for a certain AUV, the policy function is unstable, and an overfitting condition may occur in the training process.

In this embodiment, a method for optimizing a policy function set is used to update a sample policy function in an update process, and the method specifically includes: the sample strategy function of each AUV is composed of a strategy function set, in each training process, random sampling is carried out from the strategy function set to obtain a sub-strategy function, measurement is carried out according to the obtained sub-strategy function, an incentive value is calculated according to sample underwater acoustic channel measurement data obtained through measurement, and the calculated incentive value is used as the whole incentive of the strategy function set.

For example, the policy function π of AUVi _i The method is characterized by comprising a strategy function set with K sub-strategy functions, and only one sub-strategy function is randomly adopted during each round of training

I.e. the overall reward for maximizing the set of policies for each AUV optimization objective:

wherein the content of the first and second substances,

is a discrete uniform distribution from 1 to K. For each sub-policy function

Separately constructing an experience playback pool

And updating the sub-strategy functions in the experience replay pool according to the updating gradients of the sub-strategy functions. To optimize the overall effect of the set of policy functions, the update gradient of each sub-policy function is

And if the integral reward is larger than the preset integral reward condition, determining the currently sampled sub-strategy function as a sample strategy function of the underwater vehicle, executing measurement on an underwater acoustic channel according to a sample measurement strategy, and training the sample strategy function according to the measured underwater acoustic channel measurement data and a reference result to obtain a trained multi-agent reinforcement learning model so that the AUV cluster performs underwater acoustic channel data measurement according to the trained multi-agent reinforcement learning model.

If the integral reward is less than or equal to the preset integral reward condition, updating the currently sampled sub-strategy function according to the updating gradient (namely formula (8)) of the currently sampled sub-strategy function, continuously carrying out random sampling, and re-acquiring sample ocean parameters, sample state information of the underwater vehicle and the sample underwater acoustic channel measurement strategy of the previous round and inputting the sample underwater acoustic channel measurement strategy into the updated sub-strategy function to obtain the underwater acoustic channel measurement strategy of the current round; performing underwater acoustic channel measurement according to the sample measurement strategy to obtain sample underwater acoustic channel measurement data of the current round; and calculating an incentive value according to the measured sample underwater acoustic channel measurement data, taking the calculated incentive value as the overall incentive of the strategy function set, and continuously judging the relationship between the overall incentive and the preset overall incentive conditions.

In the embodiment, the strategy function of the sample is updated by adopting a strategy function set optimization method, so that the overfitting condition of the strategy function of the sample is reduced, the obtained strategy function of the sample is more accurate, and the further measured underwater acoustic channel data is more accurate.

In order to easily understand the technical solution provided by the embodiment of the present application, a complete procedure of the underwater acoustic channel measurement method is used to briefly describe the method of the underwater acoustic channel measurement provided by the embodiment of the present application:

(1) And acquiring ocean parameters and local state information of the underwater vehicle.

(2) Inputting the ocean parameters, the local state information of the underwater vehicle, the current underwater vehicle and the underwater acoustic channel measurement data measured in the previous round of the underwater vehicle in the preset range into a pre-trained multi-agent reinforcement learning model to obtain the underwater acoustic channel measurement strategy of the current round.

(3) And performing underwater acoustic channel measurement according to the underwater acoustic channel measurement strategy to obtain the underwater acoustic channel measurement data of the current round.

(4) When the underwater acoustic channel measurement data of the current wheel meet preset effective conditions, current ocean parameters and local state information of all the underwater vehicles are obtained;

and training the multi-agent reinforcement learning model according to the current ocean parameters, the current local state information of each underwater vehicle and the underwater acoustic channel measurement strategy of the current round so as to make the next round of decision until the current underwater acoustic channel measurement task is finished.

It should be understood that, although the steps in the flowchart of fig. 2 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps in fig. 2 may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed in turn or alternately with other steps or at least a portion of the other steps or stages.

In one embodiment, as shown in fig. 3, there is provided an underwater acoustic channel measuring apparatus including: an acquisition module 302, a decision module 304, and a measurement module 306, wherein:

the obtaining module 302 is configured to obtain marine parameters and local state information of the underwater vehicle, where the local state information includes position information of the current underwater vehicle and state information of the underwater vehicle within a preset range.

And the decision module 304 is used for inputting the marine parameters, the local state information of the underwater vehicle, the current underwater vehicle and the underwater acoustic channel measurement data measured in the previous round of the underwater vehicle in the preset range into a pre-trained multi-agent reinforcement learning model to obtain the underwater acoustic channel measurement strategy of the round.

And the measuring module 306 is configured to perform underwater acoustic channel measurement according to an underwater acoustic channel measurement strategy to obtain the underwater acoustic channel measurement data of the current round.

In one embodiment, the underwater acoustic channel device further comprises a judging module, configured to obtain current marine parameters and local state information of each underwater vehicle when the underwater acoustic channel measurement data of the current round meets a preset effective condition; and training the multi-agent reinforcement learning model according to the current ocean parameters, the current local state information of all underwater vehicles and the underwater acoustic channel measurement strategy of the current round.

In one embodiment, the underwater acoustic channel device further comprises a training module, which is used for acquiring sample ocean parameters, sample state information of the underwater vehicle and a previous round of sample underwater acoustic channel measurement strategy; inputting the sample ocean parameters, the sample state information of the underwater vehicle and the sample underwater acoustic channel measurement strategy of the previous round into a sample strategy function to obtain a sample measurement strategy of the current round; performing underwater acoustic channel measurement according to a sample measurement strategy to obtain sample underwater acoustic channel measurement data of the current round; calculating the sample underwater acoustic channel measurement data and a reference result to obtain a reward value corresponding to a sample measurement strategy; and updating the sample strategy function according to the reward value until the difference value between the reward value of the previous round and the reward value of the current round is smaller than a preset threshold value, so as to obtain the multi-agent reinforcement learning model.

In one embodiment, the previous round of sample underwater acoustic channel measurement strategy comprises the previous round of sample underwater acoustic channel measurement strategies of all the submergers; the training module is also used for acquiring sample ocean parameters, sample state information of the underwater vehicle and a sample underwater acoustic channel measurement strategy of the current underwater vehicle in the previous round; and obtaining corresponding sample underwater acoustic channel measurement strategies according to respective strategy functions of other underwater vehicles.

In one embodiment, the training module is further configured to train respective strategy functions of the other underwater vehicles according to the historical sample underwater acoustic channel measurement strategies of the other underwater vehicles and the reference result; and when the error between the sample underwater acoustic channel measurement data obtained through the respective strategy functions of the other underwater vehicles and the reference result is smaller than a preset error threshold value, obtaining the strategy functions of the other underwater vehicles.

In one embodiment, the training module is further used for updating the sample strategy function according to the reward value, the sample underwater acoustic channel measurement strategy of the current underwater vehicle in the previous round and the sample underwater acoustic channel measurement strategies of other underwater vehicles.

In one embodiment, the training module is further configured to randomly sample, for each underwater vehicle, from the strategy function set of the underwater vehicle, and determine the overall reward of the strategy function set each time a sub-strategy function is obtained; if the overall reward is larger than a preset overall reward condition, determining the currently sampled sub-strategy function as a sample strategy function of the underwater vehicle; and if the overall reward is less than or equal to the preset overall reward condition, updating the currently sampled sub-strategy function according to the updating gradient of the currently sampled sub-strategy function, and continuing to perform random sampling.

For specific limitations of the underwater acoustic channel measurement apparatus, reference may be made to the above limitations of the underwater acoustic channel measurement method, which are not described herein again. The modules in the underwater acoustic channel measurement device can be wholly or partially realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the unmanned underwater vehicle, and can also be stored in a memory in the unmanned underwater vehicle in a software form, so that the processor can call and execute the corresponding operations of the modules.

In one embodiment, an unmanned underwater vehicle is provided, and the unmanned underwater vehicle can be a terminal, and the internal structure diagram of the unmanned underwater vehicle can be shown in fig. 4. The unmanned underwater vehicle comprises a processor, a memory, a communication interface, a display screen and an input device which are connected through a system bus. Wherein the processor of the unmanned underwater vehicle is used for providing calculation and control capability. The memory of the unmanned underwater vehicle comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operating system and the computer program to run on the non-volatile storage medium. The communication interface of the unmanned underwater vehicle is used for carrying out wired or wireless communication with an external terminal, and the wireless communication can be realized through WIFI, an operator network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement an underwater acoustic channel measurement method. The display screen of the unmanned underwater vehicle can be a liquid crystal display screen or an electronic ink display screen, and the input device of the unmanned underwater vehicle can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the unmanned underwater vehicle, and an external keyboard, a touch pad or a mouse and the like.

Those skilled in the art will appreciate that the configuration shown in fig. 4 is a block diagram of only a portion of the configuration relevant to the present teachings and does not constitute a limitation on the unmanned underwater vehicle to which the present teachings are applied, and that a particular unmanned underwater vehicle may include more or fewer components than shown in the figures, or may combine certain components, or have a different arrangement of components.

In one embodiment, there is provided an unmanned underwater vehicle comprising a memory and a processor, the memory having stored therein a computer program which when executed by the processor performs the steps of:

In one embodiment, the processor, when executing the computer program, further performs the steps of: carrying out underwater acoustic channel measurement according to an underwater acoustic channel measurement strategy, and after acquiring the underwater acoustic channel measurement data of the current round, further comprising: when the underwater acoustic channel measurement data of the current round accord with preset effective conditions, current ocean parameters and local state information of all underwater vehicles are obtained; and training the multi-agent reinforcement learning model according to the current ocean parameters, the current local state information of each underwater vehicle and the underwater acoustic channel measurement strategy of the current round.

In one embodiment, the processor, when executing the computer program, further performs the steps of: a process for training a multi-agent reinforcement learning model, comprising: obtaining sample ocean parameters, sample state information of an underwater vehicle and a previous round of sample underwater acoustic channel measurement strategy; inputting sample ocean parameters, sample state information of the underwater vehicle and a previous sample underwater acoustic channel measurement strategy into a sample strategy function to obtain a current sample measurement strategy; performing underwater acoustic channel measurement according to a sample measurement strategy to obtain sample underwater acoustic channel measurement data of the current round; calculating the measurement data of the sample underwater acoustic channel and a reference result to obtain an award value corresponding to a sample measurement strategy; and updating the sample strategy function according to the reward value until the difference value between the reward value of the previous round and the reward value of the current round is smaller than a preset threshold value, and obtaining the multi-agent reinforcement learning model.

In one embodiment, the processor when executing the computer program further performs the steps of: the previous round of sample underwater acoustic channel measurement strategy comprises the previous round of sample underwater acoustic channel measurement strategies of all the underwater vehicles; obtaining sample ocean parameters, sample state information of a submersible vehicle and a previous round of sample underwater acoustic channel measurement strategy, wherein the measurement strategy comprises the following steps: obtaining sample ocean parameters, sample state information of an underwater vehicle and a sample underwater acoustic channel measurement strategy of a previous round of current underwater vehicle; and obtaining corresponding sample underwater acoustic channel measurement strategies according to respective strategy functions of other submergible devices.

In one embodiment, the processor, when executing the computer program, further performs the steps of: obtaining corresponding sample underwater acoustic channel measurement strategies according to respective strategy functions of other underwater vehicles, wherein the strategies comprise: training respective strategy functions of other underwater vehicles according to the historical sample underwater acoustic channel measurement strategies of other underwater vehicles and the reference result; and when the error between the sample underwater acoustic channel measurement data obtained through the respective strategy functions of the other underwater vehicles and the reference result is smaller than a preset error threshold value, obtaining the strategy functions of the other underwater vehicles.

In one embodiment, the processor, when executing the computer program, further performs the steps of: updating the sample policy function according to the reward value, comprising: and updating the sample strategy function according to the reward value, the sample underwater acoustic channel measurement strategy of the current underwater vehicle in the previous round and the sample underwater acoustic channel measurement strategies of other underwater vehicles.

In one embodiment, the processor when executing the computer program further performs the steps of: updating the sample policy function according to the reward value, comprising: for each underwater vehicle, randomly sampling from a strategy function set of the underwater vehicle, and determining the integral reward of the strategy function set when each sub-strategy function is obtained; if the overall reward is larger than a preset overall reward condition, determining the currently sampled sub-strategy function as a sample strategy function of the underwater vehicle; and if the overall reward is less than or equal to the preset overall reward condition, updating the currently sampled sub-strategy function according to the updating gradient of the currently sampled sub-strategy function, and continuing to perform random sampling.

In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of:

In one embodiment, the computer program when executed by the processor further performs the steps of: carrying out underwater acoustic channel measurement according to an underwater acoustic channel measurement strategy, and after acquiring the underwater acoustic channel measurement data of the current round, further comprising: when the underwater acoustic channel measurement data of the current round accord with preset effective conditions, current ocean parameters and local state information of all underwater vehicles are obtained; and training the multi-agent reinforcement learning model according to the current ocean parameters, the current local state information of each underwater vehicle and the underwater acoustic channel measurement strategy of the current round.

In one embodiment, the computer program when executed by the processor further performs the steps of: the training process of the multi-agent reinforcement learning model comprises the following steps: obtaining sample ocean parameters, sample state information of an underwater vehicle and a previous round of sample underwater acoustic channel measurement strategy; inputting sample ocean parameters, sample state information of the underwater vehicle and a previous sample underwater acoustic channel measurement strategy into a sample strategy function to obtain a current sample measurement strategy; performing underwater acoustic channel measurement according to a sample measurement strategy to obtain sample underwater acoustic channel measurement data of the current round; calculating the measurement data of the sample underwater acoustic channel and a reference result to obtain an award value corresponding to a sample measurement strategy; and updating the sample strategy function according to the reward value until the difference value between the reward value of the previous round and the reward value of the current round is smaller than a preset threshold value, and obtaining the multi-agent reinforcement learning model.

In one embodiment, the computer program when executed by the processor further performs the steps of: the previous round of sample underwater acoustic channel measurement strategy comprises the sample underwater acoustic channel measurement strategies of all the underwater vehicles in the previous round; the method for obtaining the sample ocean parameters, the sample state information of the underwater vehicle and the previous round of sample underwater acoustic channel measurement strategy comprises the following steps: obtaining sample ocean parameters, sample state information of a submarine vehicle and a sample underwater acoustic channel measurement strategy of a previous round of current submarine vehicle; and obtaining corresponding sample underwater acoustic channel measurement strategies according to respective strategy functions of other underwater vehicles.

In one embodiment, the computer program when executed by the processor further performs the steps of: obtaining corresponding sample underwater acoustic channel measurement strategies according to respective strategy functions of other underwater vehicles, wherein the strategy comprises the following steps: training respective strategy functions of other underwater vehicles according to the historical sample underwater acoustic channel measurement strategies of other underwater vehicles and the reference result; and when the error between the sample underwater acoustic channel measurement data obtained through the respective strategy functions of the other underwater vehicles and the reference result is smaller than a preset error threshold value, obtaining the strategy functions of the other underwater vehicles.

In one embodiment, the computer program when executed by the processor further performs the steps of: updating the sample policy function according to the reward value, comprising: and updating the sample strategy function according to the reward value, the sample underwater acoustic channel measurement strategy of the current underwater vehicle in the previous round and the sample underwater acoustic channel measurement strategies of other underwater vehicles.

In one embodiment, the computer program when executed by the processor further performs the steps of: updating the sample strategy function according to the reward value, wherein the updating comprises the following steps: for each underwater vehicle, randomly sampling from a strategy function set of the underwater vehicle, and determining the integral reward of the strategy function set when each sub-strategy function is obtained; if the integral reward is larger than a preset integral reward condition, determining the currently sampled sub-strategy function as a sample strategy function of the underwater vehicle; and if the overall reward is less than or equal to the preset overall reward condition, updating the currently sampled sub-strategy function according to the updating gradient of the currently sampled sub-strategy function, and continuing to perform random sampling.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above may be implemented by hardware instructions of a computer program, which may be stored in a non-volatile computer-readable storage medium, and when executed, may include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include at least one of non-volatile and volatile memory. Non-volatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical storage, or the like. Volatile Memory can include Random Access Memory (RAM) or external cache Memory. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others.

All possible combinations of the technical features in the above embodiments may not be described for the sake of brevity, but should be considered as being within the scope of the present disclosure as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. An underwater acoustic channel measurement method, comprising:

inputting the ocean parameters, the local state information of the underwater vehicle, the current underwater vehicle and the underwater acoustic channel measurement data measured in the previous round of the underwater vehicle in the preset range into a pre-trained multi-agent reinforcement learning model to obtain the underwater acoustic channel measurement strategy of the current round; the underwater acoustic channel measurement data measured in the previous round of the current underwater vehicle is acquired from a database of the current underwater vehicle, and the underwater acoustic channel measurement data measured in the previous round of the current underwater vehicle within the preset range is acquired in a broadcasting mode;

and performing underwater acoustic channel measurement according to the underwater acoustic channel measurement strategy to obtain the underwater acoustic channel measurement data of the current round, when the underwater acoustic channel measurement data of the current round accord with preset effective conditions, obtaining current ocean parameters and local state information of each underwater vehicle, and training a multi-agent reinforcement learning model according to the current ocean parameters, the current local state information of each underwater vehicle and the underwater acoustic channel measurement strategy of the current round, wherein the preset effective conditions are set according to the physical rules of the measured parameters of channel frequency response.

2. The method of claim 1, wherein the underwater acoustic channel measurement data refers to a parameter of a channel frequency response.

3. The method as claimed in claim 1 or 2, wherein the training process of the multi-agent reinforcement learning model comprises:

obtaining sample ocean parameters, sample state information of an underwater vehicle and a previous round of sample underwater acoustic channel measurement strategy;

inputting the sample ocean parameters, the sample state information of the underwater vehicle and the sample underwater acoustic channel measurement strategy of the previous round into a sample strategy function to obtain a sample measurement strategy of the current round;

performing underwater acoustic channel measurement according to the sample measurement strategy to obtain sample underwater acoustic channel measurement data of the current round;

calculating the sample underwater acoustic channel measurement data and a reference result to obtain a reward value corresponding to a sample measurement strategy;

4. The method of claim 3, wherein the previous round of sample underwater acoustic channel measurement strategy comprises a previous round of sample underwater acoustic channel measurement strategy for all of the submergers;

acquiring the sample ocean parameters, the sample state information of the underwater vehicle and a sample underwater acoustic channel measurement strategy of the previous round of current underwater vehicle;

5. The method according to claim 4, wherein the obtaining of the corresponding sample underwater acoustic channel measurement strategy according to the respective strategy functions of the other submersible vehicles comprises:

training respective strategy functions of the other underwater vehicles according to the historical sample underwater acoustic channel measurement strategies of the other underwater vehicles and the reference result;

6. The method of claim 5, wherein updating the sample policy function according to the reward value comprises:

7. The method of claim 3, wherein updating the sample policy function according to the reward value comprises:

and if the overall reward is less than or equal to a preset overall reward condition, updating the currently sampled sub-strategy function according to the updating gradient of the currently sampled sub-strategy function, and continuing to perform random sampling.

8. An underwater acoustic channel measurement apparatus, comprising:

the decision module is used for inputting the marine parameters, the local state information of the underwater vehicle, the current underwater vehicle and the underwater acoustic channel measurement data measured in the previous round of the underwater vehicle in the preset range into a pre-trained multi-agent reinforcement learning model to obtain the underwater acoustic channel measurement strategy of the current round; the underwater acoustic channel measurement data measured in the previous round of the current underwater vehicle is acquired from a database of the current underwater vehicle, and the underwater acoustic channel measurement data measured in the previous round of the current underwater vehicle within the preset range is acquired in a broadcasting mode;

the measurement module is used for performing underwater acoustic channel measurement according to the underwater acoustic channel measurement strategy to obtain the underwater acoustic channel measurement data of the current round, acquiring current ocean parameters and local state information of each underwater vehicle when the underwater acoustic channel measurement data of the current round meet preset effective conditions, and training a multi-agent reinforcement learning model according to the current ocean parameters, the current local state information of each underwater vehicle and the underwater acoustic channel measurement strategy of the current round, wherein the preset effective conditions are set according to the physical rules of the measured parameters of channel frequency response.

9. An unmanned underwater vehicle comprising a memory and a processor, the memory storing a computer program, wherein the processor when executing the computer program implements the steps of the method of any of claims 1 to 7.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.