CN113947194A

CN113947194A - Lightweight reinforcement learning model construction method for plateau scene intelligent oxygen supply

Info

Publication number: CN113947194A
Application number: CN202111211867.8A
Authority: CN
Inventors: 张羽; 杨慧
Original assignee: Northwestern Polytechnical University
Current assignee: Northwestern Polytechnical University
Priority date: 2021-10-18
Filing date: 2021-10-18
Publication date: 2022-01-18

Abstract

The invention discloses a lightweight reinforcement learning model construction method for plateau scene intelligent oxygen supply, which comprises the following steps: inputting an environment state, preprocessing data, using a neural network decision output action, receiving an environment feedback reward, and updating parameters of the neural network. Various factors in extreme environments such as plateau and the like can be comprehensively considered, the accuracy of the model is improved, meanwhile, the calculated amount of the model is reduced as much as possible under the condition of making a correct decision, the oxygen supply task is efficiently completed, and the model can be used as a model basis for intelligently controlling the oxygen supply amount of an oxygen supply system.

Description

Lightweight reinforcement learning model construction method for plateau scene intelligent oxygen supply

Technical Field

The invention relates to the technical field of machine learning, in particular to a light weight reinforcement learning model construction method for plateau scene intelligent oxygen supply.

Background

The altitude reaction is an uncomfortable symptom generated after a person enters a plateau with the altitude of more than 3000 meters and is exposed to a low-pressure and low-oxygen environment, and is a unique common disease in the plateau area. The harm of the altitude stress to the human body is great, and the reduction of the altitude stress has great significance to the psychological and physiological influences, so that the portable and intelligent oxygen supply system is urgently needed to be provided for people doing altitude operations, the intelligent reinforcement learning technology in machine learning can be adopted, and the intelligent body can gradually adapt to the environment in training, so that the best overall benefit is obtained.

The invention patent 201210307733.0 relates to a portable oxygen generator suitable for plateau areas, which is added with a series of intelligent judgment devices based on the traditional oxygen generator to adjust the output oxygen flow and form pulse oxygen supply, but the device has larger volume, is not suitable for being carried by a single person and can only be used in non-mobile scenes.

Disclosure of Invention

The invention provides a light weight reinforcement learning model construction method for plateau scene intelligent oxygen supply. The model uses the reinforcement learning technology in the field of artificial intelligence, can comprehensively consider various factors in the extreme environment of the plateau, improves the accuracy of the model, reduces the calculated amount of the model as much as possible under the condition of making a correct decision, efficiently completes the oxygen supply task, and can be used as a model for intelligently controlling the oxygen supply amount of an oxygen supply system.

The invention provides a method for constructing a lightweight reinforcement learning model for plateau scene intelligent oxygen supply, which comprises the following steps:

s1: receiving plateau environmental state information of current oxygen supply;

s2: preprocessing the plateau environmental state information to obtain an environmental state matrix;

s3: obtaining a profit estimation value set of each action by utilizing a neural network according to the environment state matrix;

s4: acquiring the optimal action in the income estimation value set of each action;

s5: judging whether the optimal action is a preset optimal action or not, if so, ending the current oxygen supply and sending the optimal action to an external task controller; otherwise, return to step S1.

Optionally, in step S1, the current oxygen supply environmental status information includes: environmental data and task data.

Optionally, the environmental data includes: altitude and temperature and humidity; and/or the task data comprises blood oxygen saturation, heart rate parameters and respiratory parameters.

Alternatively, the step S2 includes: sampling the environment state information; and carrying out noise reduction operation on the sampled environmental state information to obtain the environmental state matrix.

Optionally, in step S3, the neural network includes an input layer, a fully-connected layer and an output layer, the input layer is used for inputting the environment state matrix, the output layer outputs the profit estimation value sets of the actions, and the fully-connected layer simultaneously connects the input layer and the output layer.

The invention has the following beneficial effects:

the reinforcement learning model in the artificial intelligence field can be well adapted to the plateau environment, better actions are learned from special environment states of the plateau, such as altitude, temperature, oxygen content and the like, so that the total yield is highest, the accuracy of a decision result is guaranteed, and a light neural network is adopted, namely, fewer network parameters and the number of layers are adopted, so that the calculated amount is reduced, and the reinforcement learning model can better work in embedded equipment.

The model core adopts a Q learning algorithm, the Q learning algorithm needs a Q value function to evaluate the value of taking certain action in a certain state of the environment due to the complicated state of the plateau scene, and the Q value function of the original Q learning algorithm adopts a table to store all the states. Because the environmental states in the plateau scene have a lot of possibilities, the adoption of table storage occupies a large storage space, the cost is high, the query efficiency is low, and in order to solve the problem, the neural network is adopted to approximate the distribution of the traditional Q value.

The input layer of the neural network is a full-link layer of 3 neurons, and receives a one-dimensional matrix formed by altitude, temperature and blood oxygen saturation data of the current state; the network output layer is a full connection layer corresponding to all the action quantities in the action set A, the output is a gain estimation value matrix of each action, and the network can be provided with 2 hidden layers which are full connection layers of 10 and 8 neurons respectively.

The invention has the beneficial effects that: the reinforcement learning model in the artificial intelligence field can better adapt to the plateau environment, better actions are learned from special environment states of the plateau such as altitude, temperature, oxygen content and the like, so that the total yield is highest, the accuracy of a decision result is guaranteed, and a light neural network is adopted, namely, fewer network parameters and the number of layers are adopted, so that the calculation amount of the model is reduced, and the model can better work in embedded equipment.

The model work flow of the invention is as follows:

the model receives as input data returned by the environmental state sensors and the task state sensors, the environmental state information such as the temperature T of the environment at time T_tAltitude of the environment Al_tThe oxygen supply task state information in the plateau environment is the blood oxygen saturation X of the user_t；

The information is used as the input of a data preprocessing module, the data preprocessing module mainly carries out data sampling and noise reduction, the sensor data can be sampled and noise reduced at fixed intervals of seconds, and the final data after noise reduction is the output of the data preprocessing module, namely a matrix S of an environment state_t；

Decision maker receiving matrix S_tAnd then provided to a neural network, which receives an input S_t3 neurons of the input layer respectively receive altitude, temperature and blood oxygen saturation data of the current state, then the data pass through 2 hidden layers and finally reach the output layer, and each neuron of the output layer outputs a revenue estimation value set Q (S) of each action at the t moment_tA). The simultaneous neural network will S_tThe blood oxygen saturation data in (1) is used as the previous action environment for giving a reward to optimize parameters of the neural network;

set Q (S) of decision maker output from neural network_tAnd, one of the Q (S) having the largest expected profit value is found out of A)_t，a′_t) Corresponding optimum action a'_tThe output as the model is sent to an external task controller for execution; after the action is taken, if the task is not completed, the step 1 is returned to continue execution, and if the task is completed, the model work is ended.

Drawings

FIG. 1 is a flow chart of a lightweight reinforcement learning model construction method for plateau scene intelligent oxygen supply provided by the invention;

fig. 2 is a schematic structural diagram of the lightweight reinforcement learning model construction method for plateau scene intelligent oxygen supply provided by the invention.

Detailed Description

The principles and features of this invention are described below in conjunction with the following drawings, which are set forth by way of illustration only and are not intended to limit the scope of the invention.

Example 1

The invention provides a method for constructing a lightweight reinforcement learning model for plateau scene intelligent oxygen supply, which is shown in reference to fig. 1 and 2 and comprises the following steps:

s1: receiving plateau environmental state information of current oxygen supply;

The invention has the following beneficial effects:

Example 2

The invention provides a light weight reinforcement learning model construction method for plateau scene intelligent oxygen supply. The method uses the reinforcement learning technology in the field of artificial intelligence, can comprehensively consider various factors in the extreme environment of the plateau, improves the accuracy of the model, reduces the calculated amount as much as possible under the condition of making a correct decision, and efficiently completes the task of oxygen supply.

The core adopts a Q learning algorithm, because the state of a plateau scene is complex, the Q learning algorithm needs a Q value function to evaluate the value of taking a certain action in a certain state of the environment, and the Q value function of the original Q learning algorithm adopts a table to store all the states. Because the environmental states in the plateau scene have a lot of possibilities, the adoption of table storage occupies a large storage space, the cost is high, the query efficiency is low, and in order to solve the problem, the neural network is adopted to approximate the distribution of the traditional Q value.

The invention has the beneficial effects that: the invention can better adapt to the plateau environment, learns better actions from special environment states of the plateau such as altitude, temperature, oxygen content and the like to ensure the highest total benefit, thereby ensuring the accuracy of decision results, adopts a light-weight neural network, namely less network parameters and layers to reduce the calculation amount of a model, and can better work in embedded equipment.

The working process of the invention is as follows:

receiving as input data returned by the environmental status sensor and the task status sensor, the environmental status information such as the temperature T of the environment at time T_tEnvironment ofAltitude of Al_tThe oxygen supply task state information in the plateau environment is the blood oxygen saturation X of the user_t；

Decision maker receiving matrix S_tThen provided to a neural network, on the one hand S_tFinally, the income estimation value set Q (S) of each action taken at the time t is output as the input of the network_tA), on the other hand S_tOptimizing parameters of the neural network as a reward;

set Q (S) of decision maker output from neural network_tAnd, one of the Q (S) having the largest expected profit value is found out of A)_t，a′_t) Corresponding optimum action a'_tThe output as the model is sent to an external task controller for execution; after the action is taken, if the task is not completed, returning to the initial workflow to continue executing, and if the task is completed, ending the model work.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A light weight reinforcement learning model construction method for plateau scene intelligent oxygen supply is characterized by comprising the following steps:

s1: receiving plateau environmental state information of current oxygen supply;

s5: judging whether the optimal action is a preset optimal action or not, if so, ending the current oxygen supply and outputting a lightweight reinforcement learning model; otherwise, return to step S1.

2. The method for constructing a light weight reinforcement learning model for plateau scene intelligent oxygen supply according to claim 1, wherein in step S1, the current oxygen supply environment state information includes: environmental data and task data.

3. The plateau scene intelligent oxygen supply-oriented lightweight reinforcement learning model construction method according to claim 2, wherein the environmental data includes: altitude and temperature and humidity; and/or

The task data includes blood oxygen saturation, heart rate parameters, and respiratory parameters.

4. The plateau scene intelligent oxygen supply-oriented lightweight reinforcement learning model construction method of claim 1, wherein the step S2 includes:

sampling the environment state information;

and carrying out noise reduction operation on the sampled environmental state information to obtain the environmental state matrix.

5. The method for constructing a light weight reinforcement learning model for plateau scene intelligent oxygen supply according to claim 1, wherein in step S3, the neural network includes an input layer, a full connection layer and an output layer, the input layer is used for inputting the environment state matrix, the output layer outputs the profit estimation value set of each action, and the full connection layer connects the input layer and the output layer at the same time.