CN112688809B

CN112688809B - Diffusion self-adaptive network learning method, system, terminal and storage medium

Info

Publication number: CN112688809B
Application number: CN202011521741.6A
Authority: CN
Inventors: 张萌飞; 靳丹琦; 陈捷; 雷攀
Original assignee: Shenggeng Intelligent Technology Xi'an Research Institute Co ltd
Current assignee: Shenggeng Intelligent Technology Xi'an Research Institute Co ltd
Priority date: 2020-12-21
Filing date: 2020-12-21
Publication date: 2023-10-03
Anticipated expiration: 2040-12-21
Also published as: CN112688809A

Abstract

A diffusion self-adaptive network learning method, system, terminal and storage medium, the method includes a random gradient descent process and a time average variance reduction random gradient descent process, in which each node of the distributed network operates P _k Second least mean square strategy and collecting this P _k Input data received in the secondary operation; during the time average reduced variance random gradient descent, the data pair length P collected before is used _k The random gradients in the time window of (a) are averaged to obtain an estimate of the average gradient and this estimate is used in the next m iterative computations to update the variance-reducing weight equation. Meanwhile, a diffusion self-adaptive network learning system, a terminal and a storage medium are provided. The method overcomes the defect that the traditional variance-reducing random gradient descent algorithm cannot be used in an online learning environment, and the method is applied to the self-adaptive diffusion network algorithm, so that the online estimation performance of the distributed diffusion network is improved.

Description

Diffusion self-adaptive network learning method, system, terminal and storage medium

Technical Field

The invention belongs to the field of adaptive signal processing, and relates to a diffusion adaptive network learning method, a system, a terminal and a storage medium, which are used for realizing online learning of a diffusion adaptive network based on reducing random gradient variance.

Background

In a multi-node network, the network cannot transfer data in a large scale and collects all data to a central node for analysis by adopting a centralized strategy due to the dispersion of the physical positions of the nodes and the limitation of the communication capacity between the nodes, the requirements of safety, robustness and the like, so that the requirement of distributed processing is met; in the context of big data, the data is often collected in a stream-like manner over time, and each moment needs to re-estimate the system model or parameters, and in addition, the system model and the parameter states may drift over time, which reflects the requirement of online processing.

The advent of adaptive algorithms in distributed networks has just met these needs. In the last decade, the field has conducted extensive research on adaptive algorithms in distributed networks, exploring their application. According to the collaboration mode and information flow mode between nodes, collaboration strategies in the distributed network mainly comprise three types: incremental policy, consensus policy, and flooding policy. The incremental strategy performs information interaction by forming a Hamiltonian loop in the network and sequentially accessing each node. Although the traffic required for the incremental strategy is small in theory, constructing the hamiltonian loop in any network is itself an NP-hard problem. Furthermore, such loops are very sensitive to node or link failures, so the incremental strategy is not well suited for distributed online adaptive signal processing. In the consensus strategy and the flooding strategy, each node needs to communicate with its neighbor node in real time, and the global target parameters in the network are estimated cooperatively by utilizing the information exchange between itself and the neighbor node. Because each node needs to acquire the information of all neighbor nodes at each moment, the two strategies need more communication resources than the incremental strategies, but the node collaboration in the distributed network structure can be fully utilized. In addition, the diffusion strategy gives nodes the capability of continuous adaptation and learning, and is an important research strategy in distributed adaptive signal processing because of the very great advantage of expandability and proved to have better stability and dynamic range than the consensus strategy. The distributed self-adaptive diffusion least mean square algorithm is shown to be a random gradient descent algorithm, and the gradient noise of the random gradient greatly hinders the rapid convergence of the algorithm. Therefore, research on how to reduce the influence of gradient noise is of great importance to improve the diffusion-based distributed online learning algorithm. In many alternatives, it is a stand-alone matter to apply a variance-reducing random gradient algorithm to a distributed adaptive network. The variance-reducing random gradient algorithm is designed to minimize the loss function defined over all data samples of the batch. Typical algorithms include the random variance reduction gradient (SVRG) algorithm and the SAGA algorithm. The SVRG algorithm uses two loops: the real gradient is calculated in the outer circulation, and the variance random gradient descent is reduced in the inner circulation. Whereas SAGA algorithms perform only one cycle, but require more memory to estimate the true gradient, the above algorithms are significantly improved in performance over the original random gradient descent algorithms, however, their design is based on batch collected samples rather than by learning the flow data in the problem online.

Disclosure of Invention

The invention aims to provide a diffusion self-adaptive network learning method, a system, a terminal and a storage medium, which aim at solving the problem that a variance-reducing random gradient descent algorithm cannot be used for an online learning environment in the prior art, and can be applied to online learning of a diffusion self-adaptive network to improve the online estimation performance of a distributed diffusion network.

In order to achieve the above purpose, the present invention has the following technical scheme:

a diffusion self-adaptive network learning method comprises a random gradient descent process and a time average variance reduction random gradient descent process, wherein each node of a distributed network runs P in the random gradient descent process _k A second least mean square strategy and collect this P _k Input data received in the secondary operation; during the time average reduced variance random gradient descent, the data collected before is used for the length P _k Averaging the random gradients in the time window to obtain an estimated value of the average gradient, and updating a weight equation for reducing variance by using the estimated value in the next m iterative computations; at the very beginning P _k At each moment, a random gradient descent process is executed when the number of iterations is greater than P _k Then, calculating the average gradient under the window function to reduce the variance of random gradient and speed up the self-adaption of the whole diffusion networkThe convergence speed of the algorithm should be calculated.

As a preferable scheme of the diffusion adaptive network learning method of the invention:

executing a random gradient descent strategy through a distributed network, and obtaining an estimated result w of a diffusion strategy at the moment i by a node k _k,i And collect the input signal stream data x at the moment i _k,i This strategy is repeatedly executed until i is greater than the length P of the window function _k And stopping.

the global cost function in the network of the random gradient descent strategy is in the form ofWherein->N represents the total number of nodes in the network, and symbol E (·) represents the data x _k,i Is expected from the distribution of (a).

the first-order and second-order convergence step length of the node k of the random gradient descent strategy meets the following conditions

Wherein delta _k Representing a cost function J _k Gradient vector of (-) satisfies delta _k Lipschitz continuous conditions.

from i > P _k Initially, the distributed network performs a reduced variance random gradient descent strategy by first setting w _k,i-1 Is assigned to the internal circulation variableI.e. < ->Then the average gradient is estimated using a window function>Wherein->Representing a cost function J _k (. Cndot.) at input as signal x _k,i When for w _k,i-1 Is a gradient of (a).

the number m of the internal loops and the length P of the window function _k There is a set relationship:

as a preferable scheme of the diffusion adaptive network learning method of the invention: in the next m iterative calculations, the resulting internal loop variable is utilizedAverage gradient->Calculating a random gradient to reduce varianceAnd then the node k obtains the estimation result w of the diffusion strategy at the moment i _k,i The method comprises the steps of carrying out a first treatment on the surface of the After m inner loop iterations are performed, the inner loop variable is updated again>Until the algorithm converges;

in a random gradient of reduced variance, the first-order convergence step size of node k satisfiesSecond order convergence step size satisfies->Wherein v is _k Representing cost function->Is v _k -strongly convex.

The invention also provides a diffusion self-adaptive network learning system, which comprises:

random gradient descent execution module for enabling each node of the distributed network to run P _k A second least mean square strategy and collect this P _k Input data received in the secondary operation;

a time average variance reduction random gradient descent execution module for using the data collected by the random gradient descent execution module for a length P _k Averaging the random gradients in the time window to obtain an estimated value of the average gradient, and updating a weight equation for reducing variance by using the estimated value in the next m iterative computations;

a time sequence control module for at the beginning P _k At each moment, controlling and executing a random gradient descent process when the iteration number is greater than P _k And then controlling the average gradient under the calculation window function, thereby reducing the variance of the random gradient.

The invention also provides a terminal device, which comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the processor realizes the steps of the diffusion self-adaptive network learning method when executing the computer program.

The present invention also provides a computer readable storage medium storing a computer program which when executed by a processor implements the steps of the diffusion adaptive network learning method.

Compared with the prior art, the invention has the following beneficial effects: the diffusion self-adaptive network learning method is a method suitable for accelerating random gradient convergence in a stream data processing environment, overcomes the defect that a traditional variance-reducing random gradient descent algorithm cannot be used in an online learning environment, and is applied to the self-adaptive diffusion network algorithm, so that the online estimation performance of a distributed diffusion network is improved. The invention effectively reduces gradient noise in online estimation of the distributed diffusion network, thereby accelerating the convergence rate of the algorithm and improving the performance of the algorithm. And the invention has a certain expansibility, is not limited to the diffusion strategy, and can be applied to other distributed strategies, such as increment strategies, consistency strategies and the like.

Drawings

FIG. 1 is a schematic diagram of an implementation of the diffusion adaptive network learning method of the present invention;

FIG. 2 is a flow chart of a design of the diffusion adaptive network learning method of the present invention;

fig. 3 the invention has a loss function model J for l=50, network node n=16 _k (w；x _k,i )＝(d _k,i -w ^T x _k,i ) ² Is a simulation result graph of the reduced variance diffusion strategy.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples.

The signal model and related quantity of the studied problem of the invention are presented as follows:

consider a distributed network consisting of N nodes. An unknown parameter vector of length lx1 needs to be estimated at each node kAn input vector x of length lx1 can be observed at node k _k,i 。

The invention provides a diffusion self-adaptive network learning method, which comprises the following steps:

s1: executing a random gradient descent strategy by the distributed network, and obtaining an estimated result w of the diffusion strategy by the node k at the moment i _k,i And collect the input signal stream data x at the moment i _k,i This strategy is repeatedly executed until i is greater than the length P of the window function _k Stopping when the operation is stopped;

random gradient descent strategyAmong these, the global cost function in the network takes the form ofWherein the method comprises the steps ofN represents the total number of nodes in the network, and symbol E (·) represents the data x _k,i Is expected by the distribution of (a);

in the random gradient descent strategy, the first-order and second-order convergence step length of the node k meets the following conditionsWherein delta _k Representing a cost function J _k Gradient vector of (-) satisfies delta _k Lipschitz continuous conditions.

S2: from i > P _k Initially, the distributed network performs a reduced variance random gradient descent strategy.

First, w is _k,i-1 Is assigned to the internal circulation variableI.e. < ->

The number of inner loops m and the length P of the window function in step S1 _k Are of similar size and are generally set as

S3: estimating average gradient using window function

Wherein the method comprises the steps ofRepresenting a cost function J _k (. Cndot.) at input as signal x _k,i When for w _k,i-1 Is a gradient of (a).

S4: in the next m iterations (i.e., inner loop), the inner loop variable obtained in step S2 is utilizedAverage gradient obtained in step S3->Calculating a random gradient for reducing variance:

and then the node k obtains the estimation result w of the diffusion strategy at the moment i _k,i ；

First-order convergence step length of reducing random gradient variance descent strategy node k meetsSecond order convergence step size satisfies->Wherein v is _k Representing cost function->Is v _k -strongly convex.

S5, after m times of inner loop iteration are executed, executing step S2 again, and updating the inner loop variable

S6: steps S2-S6 are repeated until the algorithm converges.

Examples

The experimental setup was as follows: wherein d is _k,i From linear modelsObtained, z _k,i Is variance +.>For convenience the embodiment of the invention assumes all node optimisation amounts w ₁ ＝…＝w _N ＝w ^＊，w ^＊ Obtained by sampling in standard normal distribution, and setting a fusion matrix C=I in a diffusion strategy ₁₆ Setting a=i for non-cooperative policy ₁₆ Here, matrix I represents an identity matrix, which is set as a standard joint matrix A for the collaborative strategy, its elements +.> Representing the number of neighbor nodes of the node k; in the control experiment, the step size of the non-cooperative diffusion strategy is set to mu ₁ ＝…＝μ _N Step size of the reduced variance diffusion strategy and the least mean square diffusion strategy is set to μ =0.0012 ₁ ＝…＝μ _N ＝0.0015。

As shown in fig. 1 and 2, a diffusion adaptive network learning method includes the following steps:

s1: executing a random gradient descent strategy by the distributed network, and obtaining an estimated result w of the diffusion strategy by the node k at the moment i _k,i And collect the input signal stream data x at the moment i _k,i This strategy is repeatedly executed until i is greater than the length P of the window function _k Stop at time, where x _k,i Is Gaussian random vector, P in comparison test _k The step size is set to mu for sizes 50 and 150, respectively ₁ ＝…＝μ _N =0.0015, initialize w _k,0 Is an arbitrary value; global cost function in random gradient descent strategy network is in the form ofWherein (1)>N represents the total number of nodes in the network, and symbol E (·) represents the data x _k,i Is expected from the distribution of (a). The first-order and second-order convergence step length of the random gradient descent strategy node k meets the following conditionsWherein delta _k Representing a cost function J _k Gradient vector of (-) satisfies delta _k Lipschitz continuous conditions.

S2: from i > P _k Initially, the distributed network performs a reduced variance random gradient descent strategy by first setting w _k,i-1 Is assigned to the internal circulation variableI.e. < ->Setting the internal circulation times->

S3: estimating average gradient using window functionWherein the method comprises the steps ofRepresenting a cost function J _k (. Cndot.) at input as signal x _k,i When for w _k,i-1 Is a gradient of (2);

s4: in the next m iterations (i.e., inner loop), the resulting inner loop variable is utilizedAverage gradient obtained ∈ ->Calculating a random gradient of reduced variance>And then node k obtains an estimate of the diffusion strategy at time iResults w _k,i The method comprises the steps of carrying out a first treatment on the surface of the The first-order convergence step length of reducing the random gradient variance descent strategy node k satisfies +.>Second order convergence step size satisfies->Wherein v is _k Representing cost function->Is v _k -strongly convex.

S5, after m times of inner loop iteration are executed, re-executing S2, and updating the inner loop variable

S6: and repeating the steps S2-S6 until the algorithm converges.

As can be seen from FIG. 3, the diffusion adaptive network online learning method for reducing the random gradient variance has better performance compared with the standard least mean square diffusion strategy, and the effectiveness of the variance reduction technology is verified. In addition, a larger window P _k The convergence speed of the algorithm can be increased compared to a smaller window because of the large P _k The average gradient can be estimated more accurately.

the random gradient descent execution module enables each node of the distributed network to operate a sub-least mean square strategy, and collects input data received in the operation;

the time average variance reduction random gradient descent execution module uses the data collected by the random gradient descent execution module to average the random gradient in a time window with the length to obtain an estimated value of an average gradient, and uses the estimated value to update a weight equation for variance reduction in the next m iterative computations;

the time sequence control module is used for controlling and executing a random gradient descent process at the first moment, and controlling and calculating the average gradient under the window function when the iteration times are larger than the later, so as to reduce the variance of the random gradient.

The invention also provides a terminal device which comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the steps of the diffusion self-adaptive network learning method are realized when the processor executes the computer program.

The present invention also proposes a computer readable storage medium storing a computer program which, when executed by a processor, implements the steps of the diffusion adaptive network learning method of the present invention described above.

The computer program may be divided into one or more modules/units which are stored in the memory and executed by the processor to perform the method of the present invention.

The terminal can be a computing device such as a desktop computer, a notebook computer, a palm computer, a cloud server and the like, and can also be a processor and a memory. The processor may be a central processing unit (CentralProcessingUnit, CPU), but may also be other general purpose processors, digital signal processors (DigitalSignalProcessor, DSP), application specific integrated circuits (ApplicationSpecificIntegratedCircuit, ASIC), off-the-shelf programmable gate arrays (Field-ProgrammableGateArray, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. The memory may be used to store computer programs and/or modules, and the processor implements the various functions of the diffusion adaptive network learning system by running or executing the computer programs and/or modules stored in the memory, and invoking data stored in the memory.

The foregoing description of the preferred embodiment of the present invention is not intended to limit the technical solution of the present invention in any way, and it should be understood that the technical solution can be modified and replaced in several ways without departing from the spirit and principle of the present invention, and these modifications and substitutions are also included in the protection scope of the claims.

Claims

1. A diffusion self-adaptive network learning method is characterized in that: applied to a distributed diffusion network, comprising a random gradient descent process and a time average reduced variance random gradient descent process, wherein each node of the distributed network runs P _k A second least mean square strategy and collect this P _k Input data received in the secondary operation; during the time average reduced variance random gradient descent, the data collected before is used for the length P _k Averaging the random gradients in the time window to obtain an estimated value of the average gradient, and updating a weight equation for reducing variance by using the estimated value in the next m iterative computations; at the very beginning P _k At each moment, a random gradient descent process is executed when the number of iterations is greater than P _k Then, calculating the average gradient under the window function, and further reducing the variance of the random gradient, so that the convergence speed of the self-adaptive algorithm of the whole diffusion network is increased;

executing a random gradient descent strategy through a distributed network, and obtaining an estimated result w of a diffusion strategy at the moment i by a node k _k,i And collect the input signal stream data x at the moment i _k,i This strategy is repeatedly executed until i is greater than the length P of the window function _k Stopping when the operation is stopped;

from i > P _k Initially, the distributed network performs a reduced variance random gradient descent strategy by first setting w _k,i-1 Is assigned to the internal circulation variableI.e. < ->Then the average gradient is estimated using a window function>Wherein->Representing a cost function J _k (. Cndot.) at input as signal x _k,i When for w _k,i-1 Is a gradient of (2);

in the next m iterative calculations, the resulting internal loop variable is utilizedAverage gradient->Calculating a random gradient of reduced variance>And then the node k obtains the estimation result w of the diffusion strategy at the moment i _k,i The method comprises the steps of carrying out a first treatment on the surface of the After m inner loop iterations are performed, the inner loop variable is updated again>Until the algorithm converges;

in a random gradient of reduced variance, the first-order convergence step size of node k satisfiesSecond-order convergence step length satisfiesWherein delta _k Parameters representing Lipschitz continuity conditions, v _k Representing cost function->Is a parameter of the strong convexity of the lens.

2. The diffusion adaptive network learning method of claim 1, wherein:

the global cost function in the network of the random gradient descent strategy is in the form ofWherein the method comprises the steps ofN represents the total number of nodes in the network, and symbol E (·) represents the data x _k,i Is expected from the distribution of (a).

3. The diffusion adaptive network learning method of claim 1, wherein:

Wherein delta _k Parameters representing Lipschitz continuity conditions, cost function J _k Gradient vector of (-) satisfies delta _k Lipschitz continuous conditions.

4. The diffusion adaptive network learning method of claim 1, wherein:

5. a diffusion adaptive network learning system, for use in a distributed diffusion network, comprising:

time-averaged variance-reduction stochastic gradient descent execution module that uses stochastic gradientsThe data collected by the execution module is reduced to the length P _k Averaging the random gradients in the time window to obtain an estimated value of the average gradient, and updating a weight equation for reducing variance by using the estimated value in the next m iterative computations;

a time sequence control module for at the beginning P _k At each moment, controlling and executing a random gradient descent process when the iteration number is greater than P _k Then, controlling the average gradient under the calculation window function, so as to reduce the variance of the random gradient;

6. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that: the processor, when executing the computer program, implements the steps of the diffusion adaptive network learning method according to any one of claims 1 to 4.

7. A computer-readable storage medium storing a computer program, characterized in that: the computer program when executed by a processor implements the steps of the diffusion adaptive network learning method according to any one of claims 1 to 4.