CN114815834A

CN114815834A - A dynamic path planning method for mobile agents in a stage environment

Info

Publication number: CN114815834A
Application number: CN202210465123.7A
Authority: CN
Inventors: 刘安东; 张柏鑫; 倪洪杰; 曹瀚仁
Original assignee: Zhejiang University of Technology ZJUT
Current assignee: Zhejiang University of Technology ZJUT
Priority date: 2022-04-29
Filing date: 2022-04-29
Publication date: 2022-07-29
Anticipated expiration: 2042-04-29
Also published as: CN114815834B

Abstract

The invention discloses a dynamic path planning method for a mobile intelligent agent in a stage environment, belonging to the technical field of path planning of intelligent robots; the method comprises the steps of firstly, obtaining obstacle information around a mobile intelligent body by constructing a global map, classifying obstacles into dynamic obstacles and static obstacles, then establishing a local map, coding the dynamic obstacle information through an LSTM network, and calculating the importance of each dynamic obstacle through a social attention mechanism to achieve better obstacle avoidance. Different avoiding conditions of the dynamic and static obstacles are responded by constructing a new reward function, so that the path planning problem of the mobile intelligent body under the complex stage environment is realized. A new experience pool updating method is provided to improve the convergence speed of network training, and meanwhile, simulation experiments are carried out on the method provided by the invention to prove the superiority of the algorithm, so that the method has very high practical value.

Description

A dynamic path planning method for mobile agents in a stage environment

技术领域technical field

本发明涉及智能机器人路径规划技术领域，具体涉及一种舞台环境下的移动智能体动态路径规划方法。The invention relates to the technical field of intelligent robot path planning, in particular to a dynamic path planning method for a mobile intelligent body in a stage environment.

背景技术Background technique

为了满足基层文化多样化服务需求，需要在基层小型文化服务综合体活动空间中开展文化演出、会议议事、展览阅览以及民俗活动等文化服务功能。有分类配置的文化设施不能满足居民对综合文化需求的期望。小型文化服务综合体可以较好的解决该问题，其着力点是推行多功能活动空间建设，它可以集成民俗活动、展览、会议、阅览等功能，形成一体化的场馆服务载体，既满足了农村基层文化的多样性和自组织性的需求，又探索出一条满足我国新农村公共文化服务要求的新模式。In order to meet the diverse service needs of grassroots culture, it is necessary to carry out cultural service functions such as cultural performances, conference discussions, exhibition reading and folk activities in the activity space of grassroots small cultural service complexes. Cultural facilities with classified configuration cannot meet residents' expectations for comprehensive cultural needs. Small cultural service complexes can better solve this problem. Its focus is to promote the construction of multi-functional activity spaces. It can integrate folk activities, exhibitions, conferences, reading and other functions to form an integrated venue service carrier, which not only meets the needs of rural areas. The diversity and self-organization needs of grassroots culture have also explored a new model to meet the requirements of public cultural services in my country's new rural areas.

为了减少土地资源的浪费，提高空间利用效率，就需要各种智能移动体协助完成多种功能空间相互快速组合以及切换，因服务空间不同，其功能空间内的配置设施和使用要求也不同，为了达到小型文化综合体“一厅多用”要求，往往要通过智能移动体协助完成多种功能空间相互快速组合以及切换，实现在小型综合体空间内拥挤环境下的动态路径规划等，从而满足单一空间多种文化服务需求。In order to reduce the waste of land resources and improve the efficiency of space utilization, various intelligent mobile bodies are required to assist in the rapid combination and switching of various functional spaces. To meet the requirements of "multi-purpose in one hall" in small cultural complexes, it is often necessary to use intelligent mobile bodies to assist in the rapid combination and switching of multiple functional spaces, so as to realize dynamic path planning in a crowded environment in small complex spaces, so as to meet the needs of a single space. A variety of cultural service needs.

小型文化综合体空间是一个典型的人、机、物共存环境，在功能空间切换过程中，空间内多个物体装备需要有其他装备物情况下进行轨迹规划运动，实现空间功能的切换服务，所以在切换过程中如何快速地躲避动静态障碍物，到达目标点，需要我们设计动态路径规划算法控制智能移动体，且文化综合体空间内环境拥挤，对动态路径规划算法要求较高。The small cultural complex space is a typical coexistence environment of people, machines and objects. In the process of functional space switching, multiple objects and equipment in the space need to have other equipment to carry out trajectory planning motion to realize the switching service of space functions, so How to quickly avoid dynamic and static obstacles and reach the target point in the switching process requires us to design a dynamic path planning algorithm to control the intelligent moving body, and the environment in the cultural complex space is crowded, so the dynamic path planning algorithm has high requirements.

传统的动态路径规划算法依赖于传感器的快速刷新来感知周围障碍物的信息，规划出来的路径也会随着动态障碍物的变化，出现绕路或者不自然轨迹的问题。不能够预测周围动态障碍物的运动趋势，缺乏适应性。对于以上问题，亟需提出一种可以区分动静态障碍物以及预测动态障碍物趋势的动态路径规划方法。The traditional dynamic path planning algorithm relies on the rapid refresh of the sensor to perceive the information of the surrounding obstacles, and the planned path will also change with the dynamic obstacles, resulting in the problem of detours or unnatural trajectories. It is unable to predict the movement trend of surrounding dynamic obstacles and lacks adaptability. For the above problems, it is urgent to propose a dynamic path planning method that can distinguish between dynamic and static obstacles and predict the trend of dynamic obstacles.

发明内容SUMMARY OF THE INVENTION

针对现有技术中存在的问题，本发明的目的在于提供一种舞台环境下的移动智能体动态路径规划方法，该方法通过在深度强化学习方法的基础上设计了新的马尔可夫决策过程和网络结构，通过引入社会注意力机制给动态障碍物添加注意力分数，使用长短期记忆神经网络来解决前馈网络维数不固定的问题，构建新的奖励函数来应对动静态障碍物的不同躲避情况，提出新的经验池更新方法提高网络训练的收敛速度，从而让移动智能体实现区分动静态障碍物以及预测动态障碍物的运动趋势。为了实现上述目的，本发明采用的技术方案如下：Aiming at the problems existing in the prior art, the purpose of the present invention is to provide a dynamic path planning method for a mobile agent in a stage environment. The method designs a new Markov decision process and Network structure, add attention score to dynamic obstacles by introducing social attention mechanism, use long short-term memory neural network to solve the problem of unfixed dimension of feedforward network, and build a new reward function to deal with different avoidance of dynamic and static obstacles Therefore, a new experience pool update method is proposed to improve the convergence speed of network training, so that the mobile agent can distinguish between dynamic and static obstacles and predict the motion trend of dynamic obstacles. In order to achieve the above object, the technical scheme adopted in the present invention is as follows:

一种舞台环境下的移动智能体动态路径规划方法，包括以下步骤：A dynamic path planning method for a mobile agent in a stage environment, comprising the following steps:

1)基于gym库建立移动智能体和动静态障碍物的仿真环境模型；1) Establish a simulation environment model of mobile agents and dynamic and static obstacles based on the gym library;

2)设计马尔代夫决策过程，设计状态空间S、动作空间A、转移概率P、奖励R和折扣因子γ；2) Design the Maldives decision-making process, design the state space S, the action space A, the transition probability P, the reward R and the discount factor γ;

3)设计神经网络结构；3) Design the neural network structure;

4)使用最佳互惠碰撞避免算法(ORCA)，通过模仿学习预训练来初始化网络参数；模仿学习结束之后然后通过移动智能体在仿真环境下的实际交互进行训练来优化网络参数；4) Use the optimal reciprocal collision avoidance algorithm (ORCA) to initialize the network parameters through imitation learning pre-training; after the imitation learning is over, the network parameters are optimized by training the actual interaction of the mobile agent in the simulation environment;

5)通过自适应时刻估计方法(Adam)训练神经网络得到最优值函数:5) The optimal value function is obtained by training the neural network through the adaptive time estimation method (Adam):

V^*(u_t)＝∑γ^Δt·Vpref·P(u_t,a_t)V ^* (u _t )=∑γΔt ^·Vpref ·P( _u _t ,at )

6)通过最大化累计回报来设定最优策略:6) Set the optimal strategy by maximizing the cumulative return:

其中，u_t表示当前移动智能体和障碍物的联合状态，a_t表示动作空间的集合，γ表示衰减因子，Δt表示两个动作之间的时间间隔，Vpref表示首选速度，V^*表示在最优值函数，P表示为状态转移函数，R表示为奖励函数；

表示下一时刻的联合状态where _ut represents the joint state of the current mobile agent and the obstacle, at represents the set of action spaces, γ represents the decay factor, _Δt represents the time interval between two actions, Vpref represents the preferred speed, and V ^* represents the Merit function, P is the state transition function, R is the reward function;

Represents the joint state at the next moment

7)根据最优策略来选择当前时刻的动作a_t直到移动智能体到达目标。7) Select the action at the current moment according to the optimal _strategy until the mobile agent reaches the target.

进一步的，所述步骤1)中将移动智能体和动态障碍物设定为半径为0.3米的圆，而将静态障碍物定义为半径在0.5米到1米之间的圆形或者为面积在1平方米到1.5平方米之间的四边形。Further, in the step 1), the mobile agent and the dynamic obstacle are set as a circle with a radius of 0.3 meters, and the static obstacle is defined as a circle with a radius of 0.5 meters to 1 meter or an area of Quadrilaterals between 1 square meter and 1.5 square meters.

进一步的，所述步骤2)中，设定状态空间S，其中动态障碍物的状态为S_D＝[P_x,P_y,V_x,V_y,r,V_pref]、静态障碍物的状态为S_S＝[P_x,P_y,r]，移动智能体的状态为S_T＝[P_x,P_y,G_x,G_y,V_x,V_y,θ,r,V_pref]联合状态u_t＝[S_T,S_S,S_D].其中(P_x,P_y)为移动智能体和动静态障碍物的当前位置,(G_x,G_y)为所设定的目标点的位置，θ为移动智能体的航向角，r为移动智能体和动静态障碍物的半径大小，V_pref为移动智能体的首选速度，(V_x,V_y)为移动智能体和动态障碍物的移动速度；Further, in the step 2), the state space S is set, wherein the state of the dynamic obstacle is S _D =[P _x ,P _y ,V _x ,V _y ,r,V _pref ], the state of the static obstacle is S _S =[P _x ,P _y ,r], the state of the mobile agent is S _T =[P _x ,P _y ,G _x ,G _y ,V _x ,V _y ,θ,r,V _pref ] joint State u _t =[S _T , S _S , S _D ]. Among them (P _x , P _y ) are the current positions of the mobile agent and static and dynamic obstacles, and (G _x , G _y ) are the set target points , θ is the heading angle of the mobile agent, r is the radius of the mobile agent and the dynamic and static obstacles, V _pref is the preferred speed of the mobile agent, (V _x , V _y ) is the mobile agent and the dynamic obstacle the moving speed of the object;

动作空间A为线速度和角速度，为了符合动力学约束，角速度分成18等分在[-π/4,π/4]区间内，线速度按照函数

x取1，2，3，4，5可获得5个变化平滑的线速度，动作空间共有90中动作组合；The action space A is the linear velocity and the angular velocity. In order to meet the dynamic constraints, the angular velocity is divided into 18 equal parts in the interval [-π/4, π/4], and the linear velocity follows the function

Take 1, 2, 3, 4, and 5 for x to obtain 5 linear velocities with smooth changes. There are 90 action combinations in the action space;

转移概率P通过轨迹预测模型来近似计算；The transition probability P is approximated by the trajectory prediction model;

奖励R设置为：The reward R is set as:

其中G_x,y是目标点的位置信息，P_x,y是移动智能体的当前位置信息，d_s是移动智能体和静态障碍物之间的距离，d_d是移动智能体和动态障碍物之间的距离；折扣因子γ取0.9。where G _{x, y} is the position information of the target point, P _{x, y} is the current position information of the mobile agent, d _s is the distance between the mobile agent and the static obstacle, and d _d is the mobile agent and the dynamic obstacle The distance between; the discount factor γ is taken as 0.9.

进一步的，所述步骤3)中的网络结构由以下模块组成：1、输入层：输入层即为上述步骤而中的联合状态u_t＝[S_T,S_S,S_D]。2、长短期记忆神经网络模块(LSTM)：通过LSTM模块可以将移动智能体周围的障碍物排序，并且可以固定网络层输出参数。3、社会注意力机制：通过社会注意力机制模块可以分析出移动智能体与周围动态障碍物发生碰撞的概率，并且以分数的形式展现出来。4、输出层：输出层通过对网络参数的加权线性组合输出最优值函数V^*(u_t)。Further, the network structure in the step 3) is composed of the following modules: 1. Input layer: the input layer is the joint state u _t =[ST , _{S S} _, S _D ] in the above steps. 2. Long Short-Term Memory Neural Network Module (LSTM): Through the LSTM module, the obstacles around the mobile agent can be sorted, and the output parameters of the network layer can be fixed. 3. Social attention mechanism: Through the social attention mechanism module, the probability of collision between the mobile agent and the surrounding dynamic obstacles can be analyzed and displayed in the form of scores. 4. Output layer: The output layer outputs the optimal value function V ^* (u _t ) through a weighted linear combination of network parameters.

进一步的，所述步骤3)中网络运行流程如下：首先将移动智能体和障碍物的状态信息输入进网络，然后根据状态信息将障碍物分为动态障碍物和静态障碍物，将移动智能体的状态和动态障碍物的状态输入LSTM模块，再输入进社会注意力机制模块，再将经过处理的状态、得到的交互特征和静态障碍物状态输入两层全连接层，最后通过激活函数对其进行归一化处理得到最优值函数。Further, the network operation process in the step 3) is as follows: first, input the state information of the mobile agent and the obstacle into the network, and then divide the obstacles into dynamic obstacles and static obstacles according to the state information, and then divide the mobile agent into a dynamic obstacle and a static obstacle. The state of the dynamic obstacle and the state of the dynamic obstacle are input into the LSTM module, and then into the social attention mechanism module, and then the processed state, the obtained interaction feature and the state of the static obstacle are input into the two-layer fully-connected layer, and finally the activation function is used to adjust it. Perform normalization to get the optimal value function.

进一步的，所述步骤4)中移动智能体在仿真环境下交互时将当前的状态信息、动作信息和奖励信息作为一条经验存储到经验池中，当经验达到最大容量时，将新的经验取代奖励低的旧的经验存储，从而提高选取优秀经验的概率，提高网络的收敛速度。在每一集交互过程中当移动智能体碰到障碍物或者超过单次集运行的最大时间时结束当前集。然后将经验通过梯度方向传播来更新网络参数。Further, in the step 4), when the mobile agent interacts in the simulation environment, the current state information, action information and reward information are stored in the experience pool as an experience, and when the experience reaches the maximum capacity, the new experience replaces the reward. Low old experience storage, thereby increasing the probability of selecting excellent experience and improving the convergence speed of the network. During each episode of interaction, the current episode ends when the mobile agent encounters an obstacle or exceeds the maximum time for a single episode run. The experience is then propagated through the gradient direction to update the network parameters.

本发明有益效果是：设计了新的马尔可夫决策过程来适应综合体空间内障碍物复杂的情况，设计了新的网络结构，来实现对动静态障碍物分类处理，实现对动态障碍物的预测，设计了新的奖励函数来应对不同障碍物的情况，提出新的经验池更新方法来提高神经网络的训练效率，使用模仿学习来对网络进行预训练，提高网络的收敛速度；从而让移动智能体在综合体空间内实现高效的动态规划方法。The beneficial effects of the invention are as follows: a new Markov decision-making process is designed to adapt to the complex situation of obstacles in the complex space, and a new network structure is designed to realize the classification and processing of dynamic and static obstacles, and realize the detection of dynamic obstacles. Prediction, a new reward function is designed to deal with different obstacles, a new experience pool update method is proposed to improve the training efficiency of the neural network, imitation learning is used to pre-train the network, and the convergence speed of the network is improved; The agent implements an efficient dynamic programming method in the complex space.

附图说明Description of drawings

图1是本发明实施例的方法实现流程图；Fig. 1 is the method realization flow chart of the embodiment of the present invention;

图2是本发明实施例中网络结构图；Fig. 2 is a network structure diagram in an embodiment of the present invention;

图3是本发明实施例中仿真结果图；Fig. 3 is the simulation result diagram in the embodiment of the present invention;

图4是本发明实施例中网络训练总奖励图。FIG. 4 is a graph of the total reward of network training in an embodiment of the present invention.

具体实施方式Detailed ways

下面结合附图及具体实施例，对本发明作进一步的详细说明。The present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments.

本发明的目的在于提供一种舞台环境下的移动智能体动态路径规划方法，该方法通过在深度强化学习方法的基础上设计了新的马尔可夫决策过程和网络结构，通过引入社会注意力机制给动态障碍物添加注意力分数，使用长短期记忆神经网络来解决前馈网络维数不固定的问题，构建新的奖励函数来应对动静态障碍物的不同躲避情况，提出新的经验池更新方法提高网络训练的收敛速度，从而让移动智能体实现区分动静态障碍物以及预测动态障碍物的运动趋势。The purpose of the present invention is to provide a dynamic path planning method for a mobile agent in a stage environment. The method designs a new Markov decision process and network structure on the basis of a deep reinforcement learning method, and introduces a social attention mechanism. Add attention scores to dynamic obstacles, use long short-term memory neural network to solve the problem that the dimension of feedforward network is not fixed, build a new reward function to deal with different avoidance situations of dynamic and static obstacles, and propose a new experience pool update method Improve the convergence speed of network training, so that the mobile agent can distinguish between dynamic and static obstacles and predict the movement trend of dynamic obstacles.

在本实施例中，仿真环境如图3所示，设规划地图的范围为10*10，路径规划的起始点为(0,-10)，目标点为(0,7.5)，静态障碍物位置随机分布，为长方形或者正方形，动态障碍物为半径大小为0.5的圆形。In this embodiment, the simulation environment is shown in Figure 3. The range of the planning map is set to 10*10, the starting point of the path planning is (0,-10), the target point is (0,7.5), and the static obstacle position Random distribution, rectangular or square, dynamic obstacles are circles with a radius of 0.5.

一种舞台环境下的移动智能体动态路径规划算法，具体步骤如下：A dynamic path planning algorithm for mobile agents in a stage environment, the specific steps are as follows:

3)设计神经网络结构；3) Design the neural network structure;

4)使用最佳互惠碰撞避免算法(ORCA)，通过模仿学习预训练3000集来初始化网络参数；然后通过移动智能体在仿真环境下的实际交互进行训练来优化网络参数。4) Using the Optimal Reciprocal Collision Avoidance Algorithm (ORCA), the network parameters are initialized by imitating the learning pre-training 3000 sets; then the network parameters are optimized by training the actual interaction of mobile agents in a simulated environment.

其中，u_t表示当前移动智能体和障碍物的联合状态，a表示动作空间的集合，γ表示衰减因子，Δt表示两个动作之间的时间间隔，Vpref表示首选速度，V^*表示在最优值函数，P表示为状态转移函数，R表示为奖励函数；

表示下一时刻的联合状态where _ut represents the joint state of the current mobile agent and obstacles, a represents the set of action spaces, γ represents the decay factor, Δt represents the time interval between two actions, Vpref represents the preferred speed, and V ^* represents the optimal Value function, P is the state transition function, R is the reward function;

Represents the joint state at the next moment

以上是本发明的较佳实施例，凡依本发明技术方案所作的改变，所产生的功能作用未超出本发明技术方案的范围时，均属于本发明的保护范围。The above are the preferred embodiments of the present invention. Any changes made according to the technical solutions of the present invention, when the resulting functional effects do not exceed the scope of the technical solutions of the present invention, belong to the protection scope of the present invention.

Claims

1. a mobile agent dynamic path planning method under a stage environment, is characterized in that, comprises the following steps:

1) Establish a simulation environment model of mobile agents and dynamic and static obstacles based on the gym library;

2) Design Markov decision process, Markov decision process is represented by quintuple <S, A, P, R, γ>, design state space S, action space A, transition probability P, reward R and discount factor γ ;

3) Design the neural network structure;

4) Using the best reciprocal collision avoidance algorithm ORCA, the network parameters are initialized by imitation learning pre-training; after the imitation learning is over, the network parameters are optimized by training the actual interaction of the mobile agent in the simulation environment;

5) The optimal value function is obtained by training the neural network through the adaptive time estimation method Adam:

V ^* (u _t )=∑γΔt ^·Vpref ·P( _u _t ,at )

6) Set the optimal strategy by maximizing cumulative returns:

where _ut represents the joint state of the current mobile agent and the obstacle, at represents the set of action spaces, γ represents the decay factor, _Δt represents the time interval between two actions, Vpref represents the preferred speed, and V ^* represents the Merit function, P is the state transition function, R is the reward function;

Represents the joint state at the next moment;

7) Select the action at the current moment according to the optimal _strategy until the mobile agent reaches the target.

2. the mobile agent dynamic path planning method under a kind of stage environment according to claim 1, is characterized in that, in described step 1), establishes simulation environment model based on gym library, and mobile agent and dynamic obstacle are set up. A circle with a radius of 0.3 m is defined, while static obstacles are defined as a circle with a radius of 0.5 m to 1 m or a quadrilateral with an area of 1 m to 1.5 m.

3. The mobile agent dynamic path planning method under a stage environment according to claim 1, wherein in the step 2), a state space S is set, and the state of the dynamic obstacle is S _D = [P _x ,P _y ,V _x ,V _y ,r,V _pref ], the state of the static obstacle is S _S =[P _x ,P _y ,r], the state of the mobile agent is S _T =[P _x ,P _y ,G _x ,G _y ,V _x ,V _y ,θ,r,V _pref ], joint state u _t =[S _T ,S _S ,S _D ]; where (P _x ,P _y ) is the movement The current position of the agent and the dynamic and static obstacles, (G _x , G _y ) is the position of the set target point, θ is the heading angle of the mobile agent, r is the radius of the mobile agent and the dynamic and static obstacles , V _pref is the preferred speed of the mobile agent, (V _x , V _y ) is the moving speed of the mobile agent and dynamic obstacles;

The action space A is the linear velocity and the angular velocity. In order to comply with the dynamic constraints, the angular velocity is divided into 18 equal parts in the interval [-π/4, π/4], and the linear velocity follows an exponential function.

Take 1, 2, 3, 4, and 5 for x to obtain 5 smooth linear velocities; there are 90 action combinations in the action space;

The transition probability P transfers the state through the actual interaction of the mobile agent in the simulation environment; the reward R is set as:

where G _{x, y} is the position information of the target point, P _{x, y} is the current position information of the mobile agent, d _s is the distance between the mobile agent and the static obstacle, and d _d is the mobile agent and the dynamic obstacle the distance between.

4. the mobile agent dynamic path planning method under a kind of stage environment according to claim 1, is characterized in that, the neural network structure in described step 3) comprises: input layer, long short term memory neural network module LSTM, Social attention mechanism and output layer;

The input layer: the input layer is the joint state u _t = [S _T , S _S , S _D ] in the above steps; the long short-term memory neural network module LSTM: the obstacles around the mobile agent are sorted through the LSTM module, And the output parameters of the network layer are fixed; social attention mechanism: the probability of collision between the mobile agent and surrounding dynamic obstacles is analyzed through the social attention mechanism module, and displayed in the form of scores; output layer: the output layer is based on the network parameters. The weighted linear combination of outputs the optimal value function V ^* (u _t ).

5. the mobile agent dynamic path planning method under a kind of stage environment according to claim 1, is characterized in that, in described step 3), the network operation process is as follows: at first the state information of mobile agent and obstacle is input Enter the neural network structure, and then divide the obstacles into dynamic obstacles and static obstacles according to the state information, input the state of the mobile agent and the state of the dynamic obstacles into the LSTM module, and then into the social attention mechanism module, and then pass The processed state, the obtained interaction feature and the static obstacle state are input into two fully connected layers, and finally they are normalized by the activation function to obtain the optimal value function.

6. the mobile agent dynamic path planning method under a kind of stage environment according to claim 1, is characterized in that, in described step 4), when mobile agent interacts in simulation environment, current state information, action information and The reward information is stored in the experience pool as an experience, and the importance is added to each experience through TD-error. TD-error is a difference between the value function of the action at a certain moment and the optimal value function of the current network. Larger values indicate poorer current experience; by definition:

P _t =(|δ _t |+ε) ^α

Among them, P _t is the probability of selecting the current experience, α and ε are constants, δ _t is TD-error, and ε is to prevent the experience from being played back after the TD-error is 0;

Give different probabilities to experience according to different experiences, thereby increasing the probability of selecting excellent experience and improving the convergence speed of the network; in each episode of interaction, when the mobile agent encounters an obstacle or exceeds the maximum time of a single episode operation, it ends The current set; the experience is then back-propagated through the gradient to update the network parameters.