CN110086518B

CN110086518B - Elastic beamforming method based on multi-arm gambling machine in wireless sensor network

Info

Publication number: CN110086518B
Application number: CN201910237835.1A
Authority: CN
Inventors: 侯健; 李星灿; 项梦梵
Original assignee: Zhejiang Sci Tech University ZSTU
Current assignee: Zhejiang Sci Tech University ZSTU
Priority date: 2019-03-27
Filing date: 2019-03-27
Publication date: 2022-06-10
Anticipated expiration: 2039-03-27
Also published as: CN110086518A

Abstract

The invention relates to a multi-arm gambling machine-based elastic beam forming method, which comprises the following steps: m sensor nodes are arranged in the wireless sensor network; when there is fault sensor node interference, the set containing different sensors is defined as an arm, and then K groups of arms are shared, wherein K is 2^m-2; initializing the selected times of each arm, the estimated value of each arm, and the probability of each arm being selected at the initial moment_i1/K; randomly selecting one arm according to the selected probability, carrying out phase updating on all sensor nodes in the selected arm once according to a random grouping distributed beam forming method, increasing the selected times corresponding to the selected arm by one, updating a reward value corresponding to the selected arm, and updating an estimated value of the selected arm; updating the probability of each arm being selected according to a boltzmann distribution; and repeating the steps until the arms corresponding to all the normal sensor nodes are selected, and enabling the power of the signals sent by the sensor nodes to be optimal. The invention is suitable for the fault removal sensor.

Description

Elastic beamforming method based on multi-arm gambling machine in wireless sensor network

技术领域technical field

本发明属于无线通信网络技术领域，具体涉及无线传感器网络中基于多臂赌博机的弹性波束形成方法。The invention belongs to the technical field of wireless communication networks, and in particular relates to an elastic beam forming method based on a multi-arm gambling machine in a wireless sensor network.

背景技术Background technique

近年来，随着传感器技术、通信技术、信息处理技术和嵌入式技术的快速发展，无线传感器网络逐渐得到广泛应用。无线传感器网络能够对目标区域中的信息进行感知和采集，最终将信息传送到用户终端，具有广泛的应用场景，目前主要应用于区域检测、医疗监测、环境监测、工业监控、军事国防等诸多领域。In recent years, with the rapid development of sensor technology, communication technology, information processing technology and embedded technology, wireless sensor networks have gradually been widely used. Wireless sensor networks can perceive and collect information in the target area, and finally transmit the information to the user terminal. It has a wide range of application scenarios. Currently, it is mainly used in many fields such as area detection, medical monitoring, environmental monitoring, industrial monitoring, military defense and so on. .

无线传感器网络由部署在监测区域内大量的廉价微型传感器节点组成，通过无线通信方式形成的一个多跳的自组织网络系统，其目的是通过协作来感知、采集和处理网络覆盖区域中被感知对象的信息，并发送给观察者。在现实情况中，由于恶劣和不确定的环境、电池电量限制、噪声影响及敌方的恶意攻击等，网络中经常会存在一些具有恶意行为的传感器以干扰系统。如果这些恶意传感器未及时得到处理，则有可能导致数据不可靠，系统不稳定以及能量损失。因此，如何设计一个策略来排除恶意传感器，从而保证所有发送信号传感器的正确是无线传感器网络面临的一个重要问题。The wireless sensor network is composed of a large number of cheap micro sensor nodes deployed in the monitoring area, and a multi-hop self-organizing network system formed by wireless communication. Its purpose is to sense, collect and process the sensed objects in the network coverage area through cooperation. information and send it to the observer. In reality, due to harsh and uncertain environments, battery power limitations, noise effects, and malicious attacks from adversaries, there are often some malicious sensors in the network to interfere with the system. If these malicious sensors are not processed in a timely manner, there is the potential for unreliable data, system instability, and loss of energy. Therefore, how to design a strategy to exclude malicious sensors, so as to ensure the correctness of all sensors that send signals, is an important issue faced by wireless sensor networks.

波束形成技术是提高无线传感器网络通信质量的技术之一。波束形成是用于传感器阵列的信号处理技术，通过使特定角度的信号经历相长干涉，而其他信号经历相消干涉，来实现定向信号传输或接收，从而实现增强所发送信号的功率。在无线传感器网络中，可以通过波束形成技术将节点发送的信号集中到接收端所在的方向，以此来增大接收端处接收的信号功率。同时，波束形成技术将传输信息所需的能耗均匀分担到多个参与传送信号的节点中，因此降低了节点的能耗，增强了无线传感器网络的寿命。然而，如果有故障传感器参与发送信号，那么必然会导致接收端处的信号功率无法最大化，即无法实现波束形成。Beamforming technology is one of the technologies to improve the communication quality of wireless sensor network. Beamforming is a signal processing technique used in sensor arrays to achieve directional signal transmission or reception by causing signals at certain angles to undergo constructive interference while other signals undergo destructive interference, thereby enhancing the power of the transmitted signal. In the wireless sensor network, the signal sent by the node can be concentrated in the direction of the receiving end through the beamforming technology, so as to increase the signal power received at the receiving end. At the same time, the beamforming technology evenly distributes the energy consumption required for transmitting information among multiple nodes participating in the signal transmission, thus reducing the energy consumption of the nodes and enhancing the lifespan of the wireless sensor network. However, if a faulty sensor is involved in transmitting the signal, the signal power at the receiving end cannot be maximized, that is, beamforming cannot be achieved.

多臂赌博机(Multi-Armed Bandit)问题属于强化学习的一个领域，其主要思想是每次玩家根据一定的策略从分配不同资源的多个给定臂中选择一个，以便找到最佳的臂，使一系列试验中的总奖励最大化。把存在故障传感器的波束形成问题转化为多臂赌博机问题，通过构建适当的即时奖励，进行多次选取并更新选取臂的概率，使得最佳臂的选取概率增加到1，最终使得每次选取的臂只包含正常传感器节点，从而排除掉故障传感器的影响，保证了接收端接收信号功率的最大化。The Multi-Armed Bandit problem belongs to a field of reinforcement learning, the main idea of which is that each time the player chooses one of multiple given arms that allocate different resources according to a certain strategy, in order to find the best arm, Maximize the total reward over a series of trials. The beamforming problem with faulty sensors is transformed into a multi-arm gambling machine problem. By constructing an appropriate immediate reward, multiple selections are made and the probability of the selection arm is updated, so that the selection probability of the best arm is increased to 1, and finally each selection is made. The arm only contains normal sensor nodes, thus eliminating the influence of the faulty sensor and ensuring the maximization of the received signal power at the receiving end.

在无线传感器网络中，对数据收集的准确性要求很高，传感器节点需要将信息准确无误地传送到用户终端，所以不能有故障传感器的干扰。在无线传感器网络中，首先被提出的是集中式方法，包含一个中心节点接收所有其他节点信息，并通过分析该信息来确认其状态。Tang和Chow提出了一种称为邻域隐藏条件随机场(NHCRF)的算法，该算法通过接收信号强度，频率和信号延迟来判断节点是否正常。该方法可以放松隐马尔可夫模型的独立假设，即使在不同的网络下也能有效地提供可靠的诊断结果。Chanak和Banerjee通过故障状态来检测故障传感器节点，给出了一种基于模糊规则的故障节点分类和管理办法方案。该方案不仅增强了故障节点的可重用性，克服了不确定性问题，还提高了网络的整体性能。为了避免对中心节点的依赖，Lee和Choi提出了一种分布式故障检测算法，其中每个节点通过与其自身状态的比较来判断其邻居节点状态。该方法依赖于最小域知识的不可知诊断(AD)方法来检测故障节点，因此可以被应用于各种无线传感器网络。AD利用不断更新的相关图来描述节点的状态，从而逐渐识别出故障节点。In the wireless sensor network, the accuracy of data collection is very high, and the sensor node needs to transmit the information to the user terminal accurately, so there cannot be the interference of the faulty sensor. In wireless sensor networks, a centralized approach was first proposed, involving a central node that receives all other node information and confirms its status by analyzing this information. Tang and Chow proposed an algorithm called Neighborhood Hidden Conditional Random Field (NHCRF), which uses received signal strength, frequency and signal delay to determine whether a node is healthy or not. This method can relax the independence assumption of hidden Markov models and can effectively provide reliable diagnostic results even under different networks. Chanak and Banerjee detect faulty sensor nodes by fault state, and propose a fuzzy rule-based classification and management method for faulty nodes. This scheme not only enhances the reusability of faulty nodes and overcomes the uncertainty problem, but also improves the overall performance of the network. To avoid the dependence on the central node, Lee and Choi proposed a distributed fault detection algorithm, in which each node judges the state of its neighbor nodes by comparing with its own state. The method relies on the agnostic diagnostic (AD) method of minimal domain knowledge to detect faulty nodes, and thus can be applied to various wireless sensor networks. AD utilizes a continuously updated correlation graph to describe the state of nodes, thereby gradually identifying faulty nodes.

因此，如何设计出一种方法用以识别和避免无线传感器网络中的故障传感器是一个很值得研究的问题。Therefore, how to design a method to identify and avoid faulty sensors in wireless sensor networks is a problem worth studying.

发明内容SUMMARY OF THE INVENTION

基于现有技术中存在的上述问题，本发明提供无线传感器网络中基于多臂赌博机的弹性波束形成方法。Based on the above problems in the prior art, the present invention provides an elastic beam forming method based on a multi-arm gambling machine in a wireless sensor network.

为了达到上述发明目的，本发明采用以下技术方案：In order to achieve the above-mentioned purpose of the invention, the present invention adopts the following technical solutions:

无线传感器网络中基于多臂赌博机的弹性波束形成方法，包括以下步骤：The elastic beamforming method based on dobby machine in wireless sensor network includes the following steps:

S1、无线传感器网络中有m个传感器节点；当存在故障传感器节点干扰时，将包含不同传感器的集合定义为臂，则共有K组臂，K＝2^m-2；初始化每个臂被选取的次数N_i＝0，每个臂的估计值Q_i(0)＝1，每个臂初始时刻被选取的概率均为p_i＝1/K；S1. There are m sensor nodes in the wireless sensor network; when there is interference from faulty sensor nodes, the set containing different sensors is defined as an arm, and there are K groups of arms, K=2 ^m -2; initialize the selected arm of each arm The times N _i =0, the estimated value of each arm Q _i (0)=1, and the probability of each arm being selected at the initial moment is p _i =1/K;

S2、根据被选取的概率随机选取一个臂，所选臂中的所有传感器节点根据随机分组分布式波束形成方法进行一次相位更新，所选臂对应的被选取次数增加一，更新所选臂对应的reward值，并更新所选臂的估计值；S2, randomly select an arm according to the probability of being selected, all sensor nodes in the selected arm perform a phase update according to the random grouping distributed beamforming method, increase the selected number of times corresponding to the selected arm by one, and update the corresponding to the selected arm. reward value, and update the estimated value of the selected arm;

S3、对于每个臂按照玻尔兹曼分布更新其被选取的概率；S3. Update the probability of being selected for each arm according to the Boltzmann distribution;

S4、重复步骤S2和S3，直至选出包含所有正常传感器节点对应的臂，并使传感器节点发送信号的功率达到最优。S4. Steps S2 and S3 are repeated until the arms corresponding to all normal sensor nodes are selected, and the power of the signal sent by the sensor nodes is optimized.

作为优选方案，所述所选臂中的所有传感器节点根据随机分组分布式波束形成方法进行一次相位更新，包括以下步骤：As a preferred solution, all sensor nodes in the selected arm perform a phase update according to the random grouping distributed beamforming method, including the following steps:

S21、初始化所选臂中的所有传感器节点的相位；S21. Initialize the phases of all sensor nodes in the selected arm;

S22、确定分组概率q，0＜q＜1；根据分组概率q将所有传感器节点随机分为两组G₁和G₂；S22, determine the grouping probability q, 0<q<1; randomly divide all sensor nodes into two groups G ₁ and G ₂ according to the grouping probability q;

S23、G₁组中的传感器节点分别在四个时隙内向接收端发送信号，并根据接收到的反馈信息以进行相位偏移；G₂组中的传感器节点分别在四个时隙内向接收端发送信号，且在每个时隙内的相位偏移均为零。S23. The sensor nodes in group G ₁ send signals to the receiving end in four time slots respectively, and perform phase shift according to the received feedback information; the sensor nodes in group G ₂ send signals to the receiving end in four time slots respectively. The signal is transmitted with zero phase offset within each slot.

作为优选方案，所述根据接收到的反馈信息以进行相位偏移，包括：接收到第一个时隙内的反馈后，相位调整为π；接收到第二个时隙内的反馈后，相位调整为π/2；接收到第三个时隙内的反馈后，相位调整为-ψ(n)-3π/2；As a preferred solution, performing the phase shift according to the received feedback information includes: after receiving the feedback in the first time slot, adjusting the phase to π; after receiving the feedback in the second time slot, adjusting the phase Adjusted to π/2; after receiving the feedback in the third time slot, the phase is adjusted to -ψ(n)-3π/2;

其中，ψ(n)＝arctan((1+α(n))/(1-α(n)))，Where, ψ(n)=arctan((1+α(n))/(1-α(n))),

α(n)＝[(P(4n+2)-P(4n))/(P(4n+2)-P(4n+1))]，α(n)=[(P(4n+2)-P(4n))/(P(4n+2)-P(4n+1))],

P(4n),P(4n+1),P(4n+2)分别代表第一、第二、第三个时隙的接收端接收的信号功率，其中，n＝1；对于第四个时隙的相位偏移，若第四个时隙的接收端接收的功率值P(4n+3)大于或等于其所处迭代阶段初始时刻接收的功率值P(4n)，则相位偏移为0；否则，相位偏移为π。P(4n), P(4n+1), P(4n+2) represent the signal power received by the receiving end of the first, second, and third time slots, respectively, where n=1; for the fourth time slot The phase offset of the slot, if the power value P(4n+3) received by the receiving end of the fourth time slot is greater than or equal to the power value P(4n) received at the initial moment of the iteration stage, then the phase offset is 0 ; otherwise, the phase offset is π.

作为优选方案，所述更新所选臂对应的reward值的方法为：As a preferred solution, the method for updating the reward value corresponding to the selected arm is:

其中，P_i(old)是所选臂包含的节点没有任何相位偏移向接收端发送信号产生的信号功率；P_i(new)为所选臂包含的节点一次相位更新向接收端发送信号产生的信号功率；T′大于0，且为常数。Among them, P _i (old) is the signal power generated by the node contained in the selected arm sending a signal to the receiving end without any phase offset; P _i (new) is the signal generated by the node contained in the selected arm sending a signal to the receiving end for one phase update The signal power of ; T' is greater than 0 and is constant.

作为优选方案，所述更新所选取臂的估计值的方法为：As a preferred solution, the method for updating the estimated value of the selected arm is:

Q_i(N_i)＝Q_i(N_i-1)+β[R_i(N_i)-Q_i(N_i-1)]，0＜β＜1。Q _i (N _i )=Q _i (N _i -1)+β[R _i (N _i )-Q _i (N _i -1)], 0<β<1.

作为优选方案，更新每个臂被选取的概率的方法为：As a preferred solution, the method for updating the probability of each arm being selected is:

作为优选方案，所述步骤S4中的使传感器节点发送信号的功率达到最优的方法为：将包含所有正常传感器节点对应的臂中的传感器节点随机分组并根据接收端反馈以调整相位进行多次迭代，直至接收端接收的信号功率达到最优值。As a preferred solution, the method for optimizing the power of the signal sent by the sensor nodes in the step S4 is: randomly group the sensor nodes in the arms corresponding to all normal sensor nodes and adjust the phase according to the feedback of the receiving end for multiple times. Iterate until the signal power received by the receiver reaches the optimal value.

作为优选方案，所述步骤S4中的使传感器节点发送信号的功率达到最优的方法，具体包括以下步骤：As a preferred solution, the method for optimizing the power of the signal sent by the sensor node in the step S4 specifically includes the following steps:

S41、初始化包含所有正常传感器节点对应的臂中各传感器节点的相位；S41. Initialize the phase of each sensor node in the arm corresponding to all normal sensor nodes;

S42、确定分组概率q，0＜q＜1；根据分组概率q将所有传感器节点随机分为两组G₁和G₂；S42, determine the grouping probability q, 0<q<1; randomly divide all sensor nodes into two groups G ₁ and G ₂ according to the grouping probability q;

S43、每次迭代包括四个时隙；G₁组中的传感器节点分别在每个时隙内向接收端发送信号，并分别根据接收到的反馈信息以进行相位偏移；G₂组中的传感器节点分别在每个时隙内向接收端发送信号，且在每个时隙内的相位偏移均为零；S43. Each iteration includes four time slots; the sensor nodes in group G ₁ send signals to the receiving end in each time slot, respectively, and perform phase shift according to the received feedback information; sensors in group G ₂ The node sends a signal to the receiver in each time slot, and the phase offset in each time slot is zero;

S44、重复步骤S42和S43，直至接收端接收的信号功率达到最优值。S44. Repeat steps S42 and S43 until the signal power received by the receiving end reaches an optimal value.

作为优选方案，所述传感器节点分配至G₁组的概率q，分配至G₂组的概率1-q。As a preferred solution, the sensor nodes are assigned to group G1 with probability q, and assigned to G2 group with probability ₁ _- q.

P(4n),P(4n+1),P(4n+2)分别代表在第n次迭代中第一、第二、第三个时隙的接收端接收的信号功率；对于第四个时隙的相位偏移，若第四个时隙内接收端接收的功率值P(4n+3)大于或等于其所处迭代阶段初始时刻接收的功率值P(4n)，则相位偏移为0；否则，相位偏移为π。P(4n), P(4n+1), P(4n+2) represent the signal power received by the receivers of the first, second, and third time slots in the nth iteration, respectively; for the fourth time slot The phase offset of the slot, if the power value P(4n+3) received by the receiver in the fourth time slot is greater than or equal to the power value P(4n) received at the initial moment of its iteration stage, the phase offset is 0 ; otherwise, the phase offset is π.

本发明与现有技术相比，有益效果是：Compared with the prior art, the present invention has the following beneficial effects:

本发明的无线传感器网络中基于多臂赌博机的弹性波束形成方法，将不同传感器的集合作为多臂赌博机问题中的一个臂，并根据一定的概率进行选取，利用多臂赌博机问题的思想找到最优的那个臂，即所有正常传感器，从而避免故障传感器的影响，最终使所有正常传感器的信号在接收端完美耦合，从而使得传感器节点发送信号功率达到最优。该方法适用于无线传感器网络排除系统中的故障传感器，提高系统的稳定性，并能够提高信号传输功率。The elastic beam forming method based on the multi-arm gambling machine in the wireless sensor network of the present invention takes the set of different sensors as one arm in the multi-arm gambling machine problem, and selects it according to a certain probability, using the idea of the multi-arm gambling machine problem Find the optimal arm, that is, all normal sensors, so as to avoid the influence of faulty sensors, and finally make the signals of all normal sensors perfectly coupled at the receiving end, so that the signal power sent by the sensor nodes is optimal. The method is suitable for the wireless sensor network to eliminate the faulty sensor in the system, improves the stability of the system, and can improve the signal transmission power.

附图说明Description of drawings

图1是本发明实施例的无线传感器网络中基于多臂赌博机的弹性波束形成方法的流程图；FIG. 1 is a flowchart of an elastic beam forming method based on a multi-arm gambling machine in a wireless sensor network according to an embodiment of the present invention;

图2是本发明实施例的无线传感器网络中基于多臂赌博机的弹性波束形成方法在迭代过程中正常传感器节点被选取概率的坐标图；2 is a coordinate diagram of the probability of normal sensor nodes being selected in the iterative process of the elastic beamforming method based on the multi-arm gambling machine in the wireless sensor network according to the embodiment of the present invention;

图3是本发明实施例的无线传感器网络中基于多臂赌博机的弹性波束形成方法在迭代过程中最优信号发送功率的坐标图；3 is a coordinate diagram of the optimal signal transmission power in the iterative process of the elastic beamforming method based on the multi-arm gambling machine in the wireless sensor network according to the embodiment of the present invention;

图4为本发明实施例的无线传感器网络中基于多臂赌博机的弹性波束形成方法中的RG-DB算法的流程图；4 is a flowchart of the RG-DB algorithm in the elastic beamforming method based on the multi-armed gambling machine in the wireless sensor network according to the embodiment of the present invention;

图5为本发明实施例的无线传感器网络中基于多臂赌博机的弹性波束形成方法中RG-DB算法中一次迭代过程中每个时隙内的操作示意图。5 is a schematic diagram of operations in each time slot in an iterative process of the RG-DB algorithm in the multi-arm gambling machine-based elastic beamforming method in the wireless sensor network according to the embodiment of the present invention.

具体实施方式Detailed ways

为了更清楚地说明本发明实施例，下面将对照附图说明本发明的具体实施方式。显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图，并获得其他的实施方式。In order to describe the embodiments of the present invention more clearly, the specific embodiments of the present invention will be described below with reference to the accompanying drawings. Obviously, the accompanying drawings in the following description are only some embodiments of the present invention. For those of ordinary skill in the art, other drawings can also be obtained from these drawings without creative efforts, and obtain other implementations.

本发明提供了一种无线传感器网络中基于多臂赌博机的弹性波束形成方法，旨在解决由于故障传感器的存在而导致传感器节点所发送信号功率在接收端无法达到最优值的问题，在保证网络低能耗性的前提下，找到一种能够排除故障传感器的节点策略；还采用基于随机分组的分布式波束形成方法(或算法)(Random Grouping DistributedBeamforming algorithm，简称RG-DB算法)对选取的传感器节点进行相位调整，并最终使得传感器节点发送的信号功率能最大化被接收端接收。The invention provides an elastic beam forming method based on a multi-arm gambling machine in a wireless sensor network, which aims to solve the problem that the signal power sent by the sensor node cannot reach the optimal value at the receiving end due to the existence of the faulty sensor. On the premise of low network energy consumption, a node strategy that can eliminate faulty sensors is found; a random grouping-based distributed beamforming method (or algorithm) (Random Grouping DistributedBeamforming algorithm, referred to as RG-DB algorithm) is also used to select sensors. The node performs phase adjustment, and finally the signal power sent by the sensor node can be maximized and received by the receiver.

本发明实施例的无线传感器网络中基于多臂赌博机的弹性波束形成方法，其主要思想是将排除故障传感器问题转化为多臂赌博机选取最佳臂的问题，通过构造合适的即时奖励，并将选取的臂(一组传感器节点)利用RG-DB算法进行相位调整，然后更新该臂被选取的概率，最终可以实现排除所有故障传感器节点，使得接收端接收的信号功率最大化。以上即为基于多臂赌博机的弹性波束形成算法(Multi-Armed Bandit based ResilientBeamforming)，简称MAB-RB算法。The main idea of the elastic beam forming method based on the multi-arm gambling machine in the wireless sensor network of the embodiment of the present invention is to convert the problem of troubleshooting sensors into the problem of selecting the best arm for the multi-arm gambling machine. The phase of the selected arm (a group of sensor nodes) is adjusted by the RG-DB algorithm, and then the probability of the arm being selected is updated. Finally, all faulty sensor nodes can be eliminated, so that the signal power received by the receiver can be maximized. The above is the Multi-Armed Bandit based Resilient Beamforming algorithm (Multi-Armed Bandit based Resilient Beamforming), referred to as MAB-RB algorithm.

针对该类传感器网络的应用特点，本发明建立了如下的网络模型：在无线传感器网络中，共m个传感器节点，每个传感器节点都可以向接收端发送信号As(t)，其中，A代表传输信号的幅度，s(t)＝e^iωt，i为虚数单位，ω为载波频率；每个传感器节点还能根据接收端的反馈信息调整自身的相位偏移，所发送的信号向各个方向均匀扩散。According to the application characteristics of this type of sensor network, the present invention establishes the following network model: in the wireless sensor network, there are m sensor nodes in total, and each sensor node can send a signal As(t) to the receiving end, where A represents The amplitude of the transmitted signal, s(t)=e ^iωt , i is the imaginary unit, and ω is the carrier frequency; each sensor node can also adjust its own phase offset according to the feedback information from the receiving end, and the transmitted signal spreads evenly in all directions .

假设系统中的每个传感器节点都有一个自身的局部振荡器，可以同步到载波频率ω。将时间分为迭代周期为T的时间段，T＝T_x+T_r(T_x表示传感器发送信号的时间段，T_r表示节点接收反馈信息的时间段)。在RG-DB算法中，每次迭代包括四个时隙，即第n次迭代发生在时间间隔[4nT,4nT+4T]期间。进行一次迭代的操作，如图5所示。信号在传播过程中会有一定延迟，并随传播距离有一定的衰减。It is assumed that each sensor node in the system has its own local oscillator, which can be synchronized to the carrier frequency ω. Divide the time into time periods with an iteration period of T, T=T _x + _Tr (T _x represents the time period during which the sensor sends signals, and _Tr represents the time period during which the node receives feedback information). In the RG-DB algorithm, each iteration includes four time slots, ie the nth iteration occurs during the time interval [4nT, 4nT+4T]. An iterative operation is performed, as shown in Figure 5. The signal will have a certain delay in the propagation process, and there will be a certain attenuation with the propagation distance.

RG-DB算法主要用于无线传感器网络中，在每个迭代周期内传感器节点按照分组概率q随机分为两组，在每个时隙发送信号，并根据接收端发回的反馈信息进行相位偏移。经过多次迭代，所有信号最终在接收端达到完美耦合，从而实现接收端的信号功率最大化。当存在故障传感器节点干扰系统时，将包含不同传感器的集合定义为臂，则一共有K组臂，其中K＝2^m-2(除了所有正常传感器或所有故障传感器的情形)。此时，弹性波束形成问题转化为MAB问题，目标即为选择仅包含正常传感器的最佳臂。The RG-DB algorithm is mainly used in wireless sensor networks. In each iteration cycle, the sensor nodes are randomly divided into two groups according to the grouping probability q, send signals in each time slot, and perform phase offset according to the feedback information sent by the receiver. shift. After many iterations, all signals are finally perfectly coupled at the receiving end, thereby maximizing the signal power at the receiving end. When a faulty sensor node interferes with the system, the set containing different sensors is defined as an arm, and there are K groups of arms in total, where K=2 ^m −2 (except for the case of all normal sensors or all faulty sensors). At this point, the elastic beamforming problem is transformed into a MAB problem, and the goal is to select the best arm that contains only normal sensors.

为了调整所选臂对应传感器的相位偏移，采用随机分组分布式波束形成方法(RG-DB)算法。以下是RG-DB算法的详细步骤描述，如图4所示：In order to adjust the phase offset of the sensor corresponding to the selected arm, a random group distributed beamforming method (RG-DB) algorithm is used. The following is a detailed step description of the RG-DB algorithm, as shown in Figure 4:

步骤A、网络初始化，即初始化网络中各个节点的相位，确定分组概率q，0＜q＜1；Step A, network initialization, that is, initialize the phase of each node in the network, and determine the grouping probability q, 0<q<1;

步骤B、将所有的传感器节点分为两组，每个节点以q的概率分到组G₁，1-q的概率分到组G₂；Step B: Divide all sensor nodes into two groups, each node is divided into group G ₁ with the probability of q, and the probability of 1-q is divided into group G ₂ ;

步骤C、对于组G₁中的每个节点，按照每个时隙对应的相位向接收端发送信号，并接收反馈信息。如图5所示，具体的相位调整策略为：在第一个时隙内的相位偏移为π，在第二个时隙内的相位偏移为π/2，在第三个时隙内的相位偏移为-ψ(n)-3π/2；Step C: For each node in the group _G1 , send a signal to the receiving end according to the phase corresponding to each time slot, and receive feedback information. As shown in Figure 5, the specific phase adjustment strategy is: the phase offset in the first slot is π, the phase offset in the second slot is π/2, and the phase offset in the third slot is π/2. The phase offset of is -ψ(n)-3π/2;

其中，ψ(n)的计算方法为：Among them, the calculation method of ψ(n) is:

ψ(n)＝arctan((1+a(n))/(1-α(n)))，ψ(n)=arctan((1+a(n))/(1-α(n))),

a(n)＝[(P(4n+2)-P(4n))/(P(4n+2)-P(4n+1))]，(P(4n),P(4n+1),P(4n+2)分别代表在第n次迭代中，第一、二、三个时隙内接收端接收的信号功率)。并且，对于组G₁的传感器节点，如果第三个时隙的接收器接收的功率值P(4n+3)大于或等于该迭代阶段初始时刻接收的功率值P(4n)，那么第四个时隙对应的相位偏移为0，否则为π。a(n)=[(P(4n+2)-P(4n))/(P(4n+2)-P(4n+1))], (P(4n),P(4n+1), P(4n+2) represents the signal power received by the receiver in the first, second, and third time slots in the nth iteration, respectively). And, for the sensor nodes of group G ₁ , if the power value P(4n+3) received by the receiver of the third time slot is greater than or equal to the power value P(4n) received at the initial moment of the iteration stage, then the fourth The phase offset corresponding to the time slot is 0, otherwise it is π.

步骤D、组G₂中的传感器节点分别在每个时隙内向接收端发送信号，且四个时隙均不做相位更新，即每个时隙内的相位偏移均为零。 _In step D, the sensor nodes in group G2 send signals to the receiving end in each time slot respectively, and no phase update is performed in the four time slots, that is, the phase offset in each time slot is zero.

步骤E、重复上述步骤B、C、D；当进行多次循环之后，接收端的信号功率即可达到最优值。Step E: Repeat the above steps B, C, and D; after several cycles are performed, the signal power at the receiving end can reach the optimal value.

通过以上步骤就可以调整节点相位偏移，最终能达到接收端的信号功率最大化，从而实现最优波束形成。Through the above steps, the node phase offset can be adjusted, and finally the signal power at the receiving end can be maximized, thereby achieving optimal beamforming.

另外，MAB-RB算法主要是将不同传感器集合作为多臂赌博机问题中的一个臂，并根据一定的概率进行选取，利用多臂赌博机问题的思想找到最优的那个臂(即所有正常传感器)，从而避免故障节点的影响。In addition, the MAB-RB algorithm mainly uses different sets of sensors as an arm in the multi-arm gambling problem, and selects it according to a certain probability, and uses the idea of the multi-arm gambling problem to find the optimal arm (that is, all normal sensors). ) to avoid the impact of faulty nodes.

故需要构造合适的即时奖励reward，并引入臂估计值Q_i(N_i)来评估臂i的性能，其中N_i是臂i被选取的次数。具体地，如图1所示，本实施例的无线传感器网络中基于多臂赌博机的弹性波束形成方法，包括以下步骤：Therefore, it is necessary to construct an appropriate immediate reward, and introduce the arm estimated value Q _i (N _i ) to evaluate the performance of arm i, where N _i is the number of times arm i is selected. Specifically, as shown in FIG. 1 , the elastic beam forming method based on the multi-arm gambling machine in the wireless sensor network of this embodiment includes the following steps:

步骤一、网络初始化，在包含m个传感器节点的无线传感器网络中，当存在故障传感器节点干扰时，则一共有K＝2^m-2个臂；初始化每个臂被选取的次数N_i＝0，每个臂的估计值Q_i(0)＝1，且每个臂被选取的概率均为p_i＝1/K；Step 1: Network initialization. In a wireless sensor network including m sensor nodes, when there is interference from a faulty sensor node, there are K= ^2m -2 arms in total; the number of times each arm is selected for initialization is N _i =0 , the estimated value of each arm Q _i (0)=1, and the probability of each arm being selected is p _i =1/K;

步骤二、根据被选取的概率随机选取一个臂，所选臂中的所有传感器节点根据随机分组分布式波束形成方法进行一次相位更新，所选臂对应的被选取次数增加1，更新所选臂对应的reward值，并更新所选臂的估计值；具体地，根据概率选取一个臂，臂中所包含的所有传感器节点根据上述RG-DB算法进行一次相位更新(即上述RG-DB算法中的步骤A-D，也称为一次迭代)，然后该臂对应的被选取次数增加一；Step 2, randomly select an arm according to the probability of being selected, all sensor nodes in the selected arm perform a phase update according to the random grouping distributed beamforming method, the number of selected arms corresponding to the selected arm is increased by 1, and the corresponding selected arm is updated. and update the estimated value of the selected arm; specifically, select an arm according to the probability, and all sensor nodes included in the arm perform a phase update according to the above-mentioned RG-DB algorithm (that is, the steps in the above-mentioned RG-DB algorithm A-D, also known as one iteration), and then the number of times the arm is selected is increased by one;

利用

计算该臂对应的reward值，其中，P_i(old)是所选臂包含的节点没有任何相位偏移向接收端发送信号产生的信号功率；P_i(new)为所选臂包含的节点一次相位更新向接收端发送信号产生的信号功率；T′大于0，且为常数；use

Calculate the reward value corresponding to the arm, where P _i (old) is the signal power generated by the node contained in the selected arm sending a signal to the receiver without any phase offset; P _i (new) is the node contained in the selected arm once The signal power generated by the phase update sending the signal to the receiving end; T' is greater than 0 and is a constant;

更新所选臂的估计值的方法为：The method to update the estimate for the selected arm is:

Q_i(N_i)＝Q_i(N_i-1)+β[R_i(N_i)-Q_i(N_i-1)]；0＜β＜1，为常数。Q _i (N _i )=Q _i (N _i -1)+β[R _i (N _i )-Q _i (N _i -1)]; 0<β<1, which is a constant.

步骤三、对于每个臂(即所有的臂)按照玻尔兹曼分布更新其被选取的概率，更新概率的方法为：Step 3. For each arm (that is, all arms), update the probability of being selected according to the Boltzmann distribution. The method of updating the probability is:

为常数。

is a constant.

步骤四、重复步骤二和三，经过多次迭代，即可选出包含所有正常节点的那个臂，并使传感器节点发送信号功率达到最优值。Step 4: Repeat steps 2 and 3. After several iterations, the arm containing all normal nodes can be selected, and the signal power sent by the sensor nodes can reach the optimal value.

其中，传感器节点发送信号功率达到最优值的方法包括：Among them, the method for the sensor node to send the signal power to reach the optimal value includes:

S42、确定分组概率q，0＜q＜1；根据分组概率q将所有传感器节点随机分为两组G₁和G₂；传感器节点分配至G₁组的概率q，分配至G₂组的概率1-q；S42. Determine the grouping probability q, 0<q<1; randomly divide all sensor nodes into two groups G ₁ and G ₂ according to the grouping probability q; the probability q of the sensor nodes being assigned to the G ₁ group, the probability of being assigned to the G ₂ group 1-q;

S43、每次迭代包括四个时隙；G₁组中的传感器节点分别在每个时隙内向接收端发送信号，并分别根据接收到的反馈信息以进行相位偏移；G₂组中的传感器节点分别在每个时隙内向接收端发送信号，且在每个时隙内的相位偏移均为零；其中，根据接收到的反馈信息以进行相位偏移，包括：接收到第一个时隙内的反馈后，相位调整为π；接收到第二个时隙内的反馈后，相位调整为π/2；接收到第三个时隙内的反馈后，相位调整为-ψ(n)-3π/2；S43. Each iteration includes four time slots; the sensor nodes in group G ₁ send signals to the receiving end in each time slot, respectively, and perform phase shift according to the received feedback information; sensors in group G ₂ The node sends a signal to the receiving end in each time slot, and the phase offset in each time slot is zero; wherein, the phase offset is performed according to the received feedback information, including: receiving the first time After receiving the feedback in the slot, the phase is adjusted to π; after receiving the feedback in the second time slot, the phase is adjusted to π/2; after receiving the feedback in the third time slot, the phase is adjusted to -ψ(n) -3π/2;

其中，ψ(n)＝arctan((1+a(n))/(1-a(n)))，Where, ψ(n)=arctan((1+a(n))/(1-a(n))),

a(n)＝[(P(4n+2)-P(4n))/(P(4n+2)-P(4n+1))]，a(n)=[(P(4n+2)-P(4n))/(P(4n+2)-P(4n+1))],

本发明的无线传感器网络中基于多臂赌博机的弹性波束形成方法的具体应用示例如下：The specific application example of the elastic beam forming method based on the multi-arm gambling machine in the wireless sensor network of the present invention is as follows:

一、网络拓扑以及参数设定，设置8个传感器，包括6个正常传感器和2个故障传感器，故障传感器的相位总是随时间变化并且是不可控的；所有传感器具有相同的衰减幅度υ_ιA＝0.5，分组概率q＝0.4；为了简化实施例，假设故障传感器数目已知，那么臂数

1. Network topology and parameter setting, set 8 sensors, including 6 normal sensors and 2 faulty sensors, the phase of the faulty sensor always changes with time and is uncontrollable; all sensors have the same attenuation amplitude υ _ι A =0.5, grouping probability q=0.4; to simplify the embodiment, assuming that the number of faulty sensors is known, then the number of arms

二、初始化每个臂的估计值Q_i(0)＝1，设置每个臂被选取的次数N_i＝0，并且每个臂的被选取的概率均为p_i＝1/K；2. Initialize the estimated value of each arm Q _i (0)=1, set the number of times each arm is selected N _i =0, and the probability of each arm being selected is p _i =1/K;

三、根据概率随机选取一个臂，臂中所包含节点根据RG-DB算法进行一次迭代，该臂对应的被选取次数增加一；并根据相应的公式更新该臂对应的reward、估计值以及被选取的概率；3. An arm is randomly selected according to the probability, the nodes contained in the arm perform one iteration according to the RG-DB algorithm, and the number of selected arms corresponding to the arm is increased by one; and the reward, estimated value and selected arm corresponding to the arm are updated according to the corresponding formula. The probability;

四、对步骤三不断进行迭代，最终可以得到所有正确传感器被选取的概率趋于1，如图2所示，并且所有正常传感器的信号功率值收敛到最大值9dB，如图3所示。4. Repeat step 3 continuously, and finally it can be obtained that the probability of all correct sensors being selected tends to 1, as shown in Figure 2, and the signal power values of all normal sensors converge to the maximum value of 9dB, as shown in Figure 3.

由此可以得到，最终每次选取的都是所有的正常传感器节点，即最优的臂。此时，故障传感器被排除，且信号发送功率达到了最大。From this, it can be obtained that all normal sensor nodes, that is, the optimal arm, are finally selected each time. At this time, the faulty sensor is eliminated, and the signal transmission power reaches the maximum.

以上所述仅是对本发明的优选实施例及原理进行了详细说明，对本领域的普通技术人员而言，依据本发明提供的思想，在具体实施方式上会有改变之处，而这些改变也应视为本发明的保护范围。The above is only a detailed description of the preferred embodiments and principles of the present invention. For those of ordinary skill in the art, according to the ideas provided by the present invention, there will be changes in the specific implementation, and these changes should also be It is regarded as the protection scope of the present invention.

Claims

1. The elastic beam forming method based on the multi-arm gambling machine in the wireless sensor network, is characterized in that, comprises the following steps:

S1. There are m sensor nodes in the wireless sensor network; when there is interference from faulty sensor nodes, the set containing different sensors is defined as an arm, and there are K groups of arms, K=2 ^m -2; initialize the selected arm of each arm The times N _i =0, the estimated value of each arm Q _i (0)=1, and the probability of each arm being selected at the initial moment is p _i =1/K;

S2, randomly select an arm according to the probability of being selected, all sensor nodes in the selected arm perform a phase update according to the random grouping distributed beamforming method, increase the selected number of times corresponding to the selected arm by one, and update the corresponding to the selected arm. reward value, and update the estimated value of the selected arm;

S3. Update the probability of being selected for each arm according to the Boltzmann distribution;

S4. Steps S2 and S3 are repeated until the arms corresponding to all normal sensor nodes are selected, and the power of the signal sent by the sensor nodes is optimized.

2. The elastic beamforming method based on a multi-arm gambling machine in the wireless sensor network according to claim 1, wherein all sensor nodes in the selected arms perform a phase update according to the random grouping distributed beamforming method , including the following steps:

S21. Initialize the phases of all sensor nodes in the selected arm;

S22, determine the grouping probability q, 0<q<1; randomly divide all sensor nodes into two groups G ₁ and G ₂ according to the grouping probability q;

S23. The sensor nodes in group G ₁ send signals to the receiving end in four time slots respectively, and perform phase shift according to the received feedback information; the sensor nodes in group G ₂ send signals to the receiving end in four time slots respectively. The signal is transmitted with zero phase offset within each slot.

3. The elastic beamforming method based on a multi-armed gambling machine in a wireless sensor network according to claim 2, wherein the performing phase shift according to the received feedback information comprises: receiving the first time After receiving the feedback in the slot, the phase is adjusted to π; after receiving the feedback in the second time slot, the phase is adjusted to π/2; after receiving the feedback in the third time slot, the phase is adjusted to -ψ(n) -3π/2;

Where, ψ(n)=arctan((1+α(n))/(1-α(n))),

α(n)=[(P(4n+2)-P(4n))/(P(4n+2)-P(4n+1))],

P(4n), P(4n+1), P(4n+2) represent the signal power received by the receiving end of the first, second, and third time slots, respectively, where n=1; for the fourth time slot The phase offset of the slot, if the power value P(4n+3) received by the receiving end of the fourth time slot is greater than or equal to the power value P(4n) received at the initial moment of the iteration stage, then the phase offset is 0 ; otherwise, the phase offset is π.

4. The elastic beamforming method based on a multi-arm gambling machine in the wireless sensor network according to claim 3, wherein the method for updating the reward value corresponding to the selected arm is:

Among them, P _i (old) is the signal power generated by the node contained in the selected arm sending a signal to the receiving end without any phase offset; P _i (new) is the signal generated by the node contained in the selected arm sending a signal to the receiving end for one phase update The signal power of ; T' is greater than 0 and is constant.

5. The elastic beamforming method based on a multi-arm gambling machine in the wireless sensor network according to claim 4, wherein the method for updating the estimated value of the selected arm is:

Q _i (N _i )=Q _i (N _i -1)+β[R _i (N _i )-Q _i (N _i -1)], 0<β<1.

6. The elastic beamforming method based on multi-arm gambling machine in the wireless sensor network according to claim 5, is characterized in that, the method that updates the probability that each arm is selected is:

7. The elastic beamforming method based on a multi-armed gambling machine in the wireless sensor network according to any one of claims 1-6, wherein the step S4 is to make the power of the signal sent by the sensor node to reach an optimal level. The method is as follows: randomly group the sensor nodes in the arms corresponding to all normal sensor nodes and adjust the phase according to the feedback of the receiver to perform multiple iterations until the signal power received by the receiver reaches the optimal value.

8. The elastic beam forming method based on a multi-armed gambling machine in a wireless sensor network according to claim 7, wherein the method for optimizing the power of the signal sent by the sensor node in the step S4 specifically comprises the following steps: step:

S41. Initialize the phase of each sensor node in the arm corresponding to all normal sensor nodes;

S42, determine the grouping probability q, 0<q<1; randomly divide all sensor nodes into two groups G ₁ and G ₂ according to the grouping probability q;

S43. Each iteration includes four time slots; the sensor nodes in group G ₁ send signals to the receiving end in each time slot, respectively, and perform phase shift according to the received feedback information; sensors in group G ₂ The node sends a signal to the receiver in each time slot, and the phase offset in each time slot is zero;

S44. Repeat steps S42 and S43 until the signal power received by the receiving end reaches an optimal value.

9 . The elastic beamforming method based on a multi-armed gambling machine in a wireless sensor network according to claim 8 , wherein the sensor nodes are assigned to the probability q of group G ₁ , and the probability of being assigned to group G ₂ is 1- q.

10 . The elastic beamforming method based on a multi-armed gambling machine in a wireless sensor network according to claim 8 , wherein the performing phase shift according to the received feedback information comprises: receiving the first time After receiving the feedback in the slot, the phase is adjusted to π; after receiving the feedback in the second time slot, the phase is adjusted to π/2; after receiving the feedback in the third time slot, the phase is adjusted to -ψ(n) -3π/2;

Where, ψ(n)=arctan((1+α(n))/(1-α(n))),

α(n)=[(P(4n+2)-P(4n))/(P(4n+2)-P(4n+1))],

P(4n), P(4n+1), P(4n+2) represent the signal power received by the receivers of the first, second, and third time slots in the nth iteration, respectively; for the fourth time slot The phase offset of the slot, if the power value P(4n+3) received by the receiver in the fourth time slot is greater than or equal to the power value P(4n) received at the initial moment of the iteration stage, then the phase offset is 0 ; otherwise, the phase offset is π.