CN113947194A - Lightweight reinforcement learning model construction method for plateau scene intelligent oxygen supply - Google Patents
Lightweight reinforcement learning model construction method for plateau scene intelligent oxygen supply Download PDFInfo
- Publication number
- CN113947194A CN113947194A CN202111211867.8A CN202111211867A CN113947194A CN 113947194 A CN113947194 A CN 113947194A CN 202111211867 A CN202111211867 A CN 202111211867A CN 113947194 A CN113947194 A CN 113947194A
- Authority
- CN
- China
- Prior art keywords
- oxygen supply
- plateau
- reinforcement learning
- learning model
- action
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- C—CHEMISTRY; METALLURGY
- C01—INORGANIC CHEMISTRY
- C01B—NON-METALLIC ELEMENTS; COMPOUNDS THEREOF; METALLOIDS OR COMPOUNDS THEREOF NOT COVERED BY SUBCLASS C01C
- C01B13/00—Oxygen; Ozone; Oxides or hydroxides in general
- C01B13/02—Preparation of oxygen
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Molecular Biology (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Organic Chemistry (AREA)
- Inorganic Chemistry (AREA)
- Measurement Of The Respiration, Hearing Ability, Form, And Blood Characteristics Of Living Organisms (AREA)
Abstract
The invention discloses a lightweight reinforcement learning model construction method for plateau scene intelligent oxygen supply, which comprises the following steps: inputting an environment state, preprocessing data, using a neural network decision output action, receiving an environment feedback reward, and updating parameters of the neural network. Various factors in extreme environments such as plateau and the like can be comprehensively considered, the accuracy of the model is improved, meanwhile, the calculated amount of the model is reduced as much as possible under the condition of making a correct decision, the oxygen supply task is efficiently completed, and the model can be used as a model basis for intelligently controlling the oxygen supply amount of an oxygen supply system.
Description
Technical Field
The invention relates to the technical field of machine learning, in particular to a light weight reinforcement learning model construction method for plateau scene intelligent oxygen supply.
Background
The altitude reaction is an uncomfortable symptom generated after a person enters a plateau with the altitude of more than 3000 meters and is exposed to a low-pressure and low-oxygen environment, and is a unique common disease in the plateau area. The harm of the altitude stress to the human body is great, and the reduction of the altitude stress has great significance to the psychological and physiological influences, so that the portable and intelligent oxygen supply system is urgently needed to be provided for people doing altitude operations, the intelligent reinforcement learning technology in machine learning can be adopted, and the intelligent body can gradually adapt to the environment in training, so that the best overall benefit is obtained.
The invention patent 201210307733.0 relates to a portable oxygen generator suitable for plateau areas, which is added with a series of intelligent judgment devices based on the traditional oxygen generator to adjust the output oxygen flow and form pulse oxygen supply, but the device has larger volume, is not suitable for being carried by a single person and can only be used in non-mobile scenes.
Disclosure of Invention
The invention provides a light weight reinforcement learning model construction method for plateau scene intelligent oxygen supply. The model uses the reinforcement learning technology in the field of artificial intelligence, can comprehensively consider various factors in the extreme environment of the plateau, improves the accuracy of the model, reduces the calculated amount of the model as much as possible under the condition of making a correct decision, efficiently completes the oxygen supply task, and can be used as a model for intelligently controlling the oxygen supply amount of an oxygen supply system.
The invention provides a method for constructing a lightweight reinforcement learning model for plateau scene intelligent oxygen supply, which comprises the following steps:
s1: receiving plateau environmental state information of current oxygen supply;
s2: preprocessing the plateau environmental state information to obtain an environmental state matrix;
s3: obtaining a profit estimation value set of each action by utilizing a neural network according to the environment state matrix;
s4: acquiring the optimal action in the income estimation value set of each action;
s5: judging whether the optimal action is a preset optimal action or not, if so, ending the current oxygen supply and sending the optimal action to an external task controller; otherwise, return to step S1.
Optionally, in step S1, the current oxygen supply environmental status information includes: environmental data and task data.
Optionally, the environmental data includes: altitude and temperature and humidity; and/or the task data comprises blood oxygen saturation, heart rate parameters and respiratory parameters.
Alternatively, the step S2 includes: sampling the environment state information; and carrying out noise reduction operation on the sampled environmental state information to obtain the environmental state matrix.
Optionally, in step S3, the neural network includes an input layer, a fully-connected layer and an output layer, the input layer is used for inputting the environment state matrix, the output layer outputs the profit estimation value sets of the actions, and the fully-connected layer simultaneously connects the input layer and the output layer.
The invention has the following beneficial effects:
the reinforcement learning model in the artificial intelligence field can be well adapted to the plateau environment, better actions are learned from special environment states of the plateau, such as altitude, temperature, oxygen content and the like, so that the total yield is highest, the accuracy of a decision result is guaranteed, and a light neural network is adopted, namely, fewer network parameters and the number of layers are adopted, so that the calculated amount is reduced, and the reinforcement learning model can better work in embedded equipment.
The model core adopts a Q learning algorithm, the Q learning algorithm needs a Q value function to evaluate the value of taking certain action in a certain state of the environment due to the complicated state of the plateau scene, and the Q value function of the original Q learning algorithm adopts a table to store all the states. Because the environmental states in the plateau scene have a lot of possibilities, the adoption of table storage occupies a large storage space, the cost is high, the query efficiency is low, and in order to solve the problem, the neural network is adopted to approximate the distribution of the traditional Q value.
The input layer of the neural network is a full-link layer of 3 neurons, and receives a one-dimensional matrix formed by altitude, temperature and blood oxygen saturation data of the current state; the network output layer is a full connection layer corresponding to all the action quantities in the action set A, the output is a gain estimation value matrix of each action, and the network can be provided with 2 hidden layers which are full connection layers of 10 and 8 neurons respectively.
The invention has the beneficial effects that: the reinforcement learning model in the artificial intelligence field can better adapt to the plateau environment, better actions are learned from special environment states of the plateau such as altitude, temperature, oxygen content and the like, so that the total yield is highest, the accuracy of a decision result is guaranteed, and a light neural network is adopted, namely, fewer network parameters and the number of layers are adopted, so that the calculation amount of the model is reduced, and the model can better work in embedded equipment.
The model work flow of the invention is as follows:
the model receives as input data returned by the environmental state sensors and the task state sensors, the environmental state information such as the temperature T of the environment at time TtAltitude of the environment AltThe oxygen supply task state information in the plateau environment is the blood oxygen saturation X of the usert;
The information is used as the input of a data preprocessing module, the data preprocessing module mainly carries out data sampling and noise reduction, the sensor data can be sampled and noise reduced at fixed intervals of seconds, and the final data after noise reduction is the output of the data preprocessing module, namely a matrix S of an environment statet;
Decision maker receiving matrix StAnd then provided to a neural network, which receives an input St3 neurons of the input layer respectively receive altitude, temperature and blood oxygen saturation data of the current state, then the data pass through 2 hidden layers and finally reach the output layer, and each neuron of the output layer outputs a revenue estimation value set Q (S) of each action at the t momenttA). The simultaneous neural network will StThe blood oxygen saturation data in (1) is used as the previous action environment for giving a reward to optimize parameters of the neural network;
set Q (S) of decision maker output from neural networktAnd, one of the Q (S) having the largest expected profit value is found out of A)t,a′t) Corresponding optimum action a'tThe output as the model is sent to an external task controller for execution; after the action is taken, if the task is not completed, the step 1 is returned to continue execution, and if the task is completed, the model work is ended.
Drawings
FIG. 1 is a flow chart of a lightweight reinforcement learning model construction method for plateau scene intelligent oxygen supply provided by the invention;
fig. 2 is a schematic structural diagram of the lightweight reinforcement learning model construction method for plateau scene intelligent oxygen supply provided by the invention.
Detailed Description
The principles and features of this invention are described below in conjunction with the following drawings, which are set forth by way of illustration only and are not intended to limit the scope of the invention.
Example 1
The invention provides a method for constructing a lightweight reinforcement learning model for plateau scene intelligent oxygen supply, which is shown in reference to fig. 1 and 2 and comprises the following steps:
s1: receiving plateau environmental state information of current oxygen supply;
s2: preprocessing the plateau environmental state information to obtain an environmental state matrix;
s3: obtaining a profit estimation value set of each action by utilizing a neural network according to the environment state matrix;
s4: acquiring the optimal action in the income estimation value set of each action;
s5: judging whether the optimal action is a preset optimal action or not, if so, ending the current oxygen supply and sending the optimal action to an external task controller; otherwise, return to step S1.
Optionally, in step S1, the current oxygen supply environmental status information includes: environmental data and task data.
Optionally, the environmental data includes: altitude and temperature and humidity; and/or the task data comprises blood oxygen saturation, heart rate parameters and respiratory parameters.
Alternatively, the step S2 includes: sampling the environment state information; and carrying out noise reduction operation on the sampled environmental state information to obtain the environmental state matrix.
Optionally, in step S3, the neural network includes an input layer, a fully-connected layer and an output layer, the input layer is used for inputting the environment state matrix, the output layer outputs the profit estimation value sets of the actions, and the fully-connected layer simultaneously connects the input layer and the output layer.
The invention has the following beneficial effects:
the reinforcement learning model in the artificial intelligence field can be well adapted to the plateau environment, better actions are learned from special environment states of the plateau, such as altitude, temperature, oxygen content and the like, so that the total yield is highest, the accuracy of a decision result is guaranteed, and a light neural network is adopted, namely, fewer network parameters and the number of layers are adopted, so that the calculated amount is reduced, and the reinforcement learning model can better work in embedded equipment.
Example 2
The invention provides a light weight reinforcement learning model construction method for plateau scene intelligent oxygen supply. The method uses the reinforcement learning technology in the field of artificial intelligence, can comprehensively consider various factors in the extreme environment of the plateau, improves the accuracy of the model, reduces the calculated amount as much as possible under the condition of making a correct decision, and efficiently completes the task of oxygen supply.
The core adopts a Q learning algorithm, because the state of a plateau scene is complex, the Q learning algorithm needs a Q value function to evaluate the value of taking a certain action in a certain state of the environment, and the Q value function of the original Q learning algorithm adopts a table to store all the states. Because the environmental states in the plateau scene have a lot of possibilities, the adoption of table storage occupies a large storage space, the cost is high, the query efficiency is low, and in order to solve the problem, the neural network is adopted to approximate the distribution of the traditional Q value.
The invention has the beneficial effects that: the invention can better adapt to the plateau environment, learns better actions from special environment states of the plateau such as altitude, temperature, oxygen content and the like to ensure the highest total benefit, thereby ensuring the accuracy of decision results, adopts a light-weight neural network, namely less network parameters and layers to reduce the calculation amount of a model, and can better work in embedded equipment.
The working process of the invention is as follows:
receiving as input data returned by the environmental status sensor and the task status sensor, the environmental status information such as the temperature T of the environment at time TtEnvironment ofAltitude of AltThe oxygen supply task state information in the plateau environment is the blood oxygen saturation X of the usert;
The information is used as the input of a data preprocessing module, the data preprocessing module mainly carries out data sampling and noise reduction, the sensor data can be sampled and noise reduced at fixed intervals of seconds, and the final data after noise reduction is the output of the data preprocessing module, namely a matrix S of an environment statet;
Decision maker receiving matrix StThen provided to a neural network, on the one hand StFinally, the income estimation value set Q (S) of each action taken at the time t is output as the input of the networktA), on the other hand StOptimizing parameters of the neural network as a reward;
set Q (S) of decision maker output from neural networktAnd, one of the Q (S) having the largest expected profit value is found out of A)t,a′t) Corresponding optimum action a'tThe output as the model is sent to an external task controller for execution; after the action is taken, if the task is not completed, returning to the initial workflow to continue executing, and if the task is completed, ending the model work.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.
Claims (5)
1. A light weight reinforcement learning model construction method for plateau scene intelligent oxygen supply is characterized by comprising the following steps:
s1: receiving plateau environmental state information of current oxygen supply;
s2: preprocessing the plateau environmental state information to obtain an environmental state matrix;
s3: obtaining a profit estimation value set of each action by utilizing a neural network according to the environment state matrix;
s4: acquiring the optimal action in the income estimation value set of each action;
s5: judging whether the optimal action is a preset optimal action or not, if so, ending the current oxygen supply and outputting a lightweight reinforcement learning model; otherwise, return to step S1.
2. The method for constructing a light weight reinforcement learning model for plateau scene intelligent oxygen supply according to claim 1, wherein in step S1, the current oxygen supply environment state information includes: environmental data and task data.
3. The plateau scene intelligent oxygen supply-oriented lightweight reinforcement learning model construction method according to claim 2, wherein the environmental data includes: altitude and temperature and humidity; and/or
The task data includes blood oxygen saturation, heart rate parameters, and respiratory parameters.
4. The plateau scene intelligent oxygen supply-oriented lightweight reinforcement learning model construction method of claim 1, wherein the step S2 includes:
sampling the environment state information;
and carrying out noise reduction operation on the sampled environmental state information to obtain the environmental state matrix.
5. The method for constructing a light weight reinforcement learning model for plateau scene intelligent oxygen supply according to claim 1, wherein in step S3, the neural network includes an input layer, a full connection layer and an output layer, the input layer is used for inputting the environment state matrix, the output layer outputs the profit estimation value set of each action, and the full connection layer connects the input layer and the output layer at the same time.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111211867.8A CN113947194A (en) | 2021-10-18 | 2021-10-18 | Lightweight reinforcement learning model construction method for plateau scene intelligent oxygen supply |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111211867.8A CN113947194A (en) | 2021-10-18 | 2021-10-18 | Lightweight reinforcement learning model construction method for plateau scene intelligent oxygen supply |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113947194A true CN113947194A (en) | 2022-01-18 |
Family
ID=79331374
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111211867.8A Pending CN113947194A (en) | 2021-10-18 | 2021-10-18 | Lightweight reinforcement learning model construction method for plateau scene intelligent oxygen supply |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113947194A (en) |
-
2021
- 2021-10-18 CN CN202111211867.8A patent/CN113947194A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2018153359A1 (en) | Emotion state prediction method and robot | |
CN108717852B (en) | Intelligent robot semantic interaction system and method based on white light communication and brain-like cognition | |
CN111475546A (en) | Financial time sequence prediction method for generating confrontation network based on double-stage attention mechanism | |
CN107817891A (en) | Screen control method, device, equipment and storage medium | |
EP2788969B1 (en) | Apparatus, system, and method for therapy based speech enhancement and brain reconfiguration | |
CN109063714A (en) | The construction method of Parkinson's disease bradykinesia video detection model based on deep neural network | |
Burrello et al. | Bioformers: Embedding transformers for ultra-low power semg-based gesture recognition | |
CN111870242A (en) | Intelligent gesture action generation method based on electromyographic signals | |
CN116564561A (en) | Intelligent voice nursing system and nursing method based on physiological and emotion characteristics | |
CN106472332A (en) | Pet feeding method and system based on dynamic intelligent algorithm | |
CN113487165A (en) | Intelligent factory production operation scheduling method and system based on deep reinforcement learning | |
CN112396001A (en) | Rope skipping number statistical method based on human body posture estimation and TPA (tissue placement model) attention mechanism | |
CN111754962A (en) | Folk song intelligent auxiliary composition system and method based on up-down sampling | |
CN117349622A (en) | Wind power plant wind speed prediction method based on hybrid deep learning mechanism | |
CN106651001A (en) | Needle mushroom yield prediction method based on improved neural network and implementation system | |
CN113947194A (en) | Lightweight reinforcement learning model construction method for plateau scene intelligent oxygen supply | |
CN114781633A (en) | Processor fusing artificial neural network and pulse neural network | |
CN118136222B (en) | Digital intelligent oxygen generation system oriented to medical environment | |
CN111300413A (en) | Multi-degree-of-freedom myoelectric artificial hand control system and using method thereof | |
CN110909621A (en) | Body-building guidance system based on vision | |
CN112906673A (en) | Lower limb movement intention prediction method based on attention mechanism | |
CN118230908A (en) | Traditional Chinese medicine large model and preference alignment method based on reinforcement learning | |
CN110363074A (en) | One kind identifying exchange method for complicated abstract class of things peopleization | |
CN116300500A (en) | Intelligent household equipment control method and related equipment | |
CN115602139A (en) | Automatic music generation method and device based on two-stage generation model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |