CN110989735A

CN110989735A - Self-adaptive adjustment method and device for sleep environment and electronic equipment

Info

Publication number: CN110989735A
Application number: CN201911102130.5A
Authority: CN
Inventors: 李绍斌; 宋德超; 陈翀; 罗晓宇; 陈向文; 岳冬
Original assignee: Gree Electric Appliances Inc of Zhuhai; Zhuhai Lianyun Technology Co Ltd
Current assignee: Gree Electric Appliances Inc of Zhuhai; Zhuhai Lianyun Technology Co Ltd
Priority date: 2019-11-12
Filing date: 2019-11-12
Publication date: 2020-04-10

Abstract

The application relates to a sleep environment self-adaptive adjusting method, a sleep environment self-adaptive adjusting device and electronic equipment, wherein the method comprises the following steps: acquiring a first environment state and a current sleep state of a user in a plurality of sleep states; inputting the current sleep state and the first environment state into a current value network model, and acquiring a first environment adjustment parameter output by the current value network model; the first environmental state is adjusted to a second environmental state based on the first environmental adjustment parameter. This application trains through the neural network model with people's real-time sleep state information and indoor real-time temperature and humidity information, comes feedback control indoor temperature and humidity, has solved how to adjust the sleep state that the environmental status adapted to the people in real time to improve the problem of sleep quality, make the indoor environment towards the direction change that helps people sleep, thereby improve people's sleep quality.

Description

Self-adaptive adjustment method and device for sleep environment and electronic equipment

Technical Field

The application relates to the technical field of smart home, in particular to a sleep environment self-adaptive adjusting method and device and electronic equipment.

Background

Sleep is vital to normal life of people, and good sleep can make people's work in daytime more effective, however because of the influence of sleep environment, can lead to people's sleep quality to descend to influence daily life. With the development of artificial intelligence science and technology, how to adapt the state of the environment to the sleep state of people in the sleep of people applying the artificial intelligence technology to improve the sleep quality becomes a hotspot of current research.

Disclosure of Invention

In order to solve the problem of how to adjust the environment state in real time to adapt to the sleep state of a person so as to improve the sleep quality, the application provides a sleep environment self-adaptive adjusting method, a sleep environment self-adaptive adjusting device and electronic equipment.

In a first aspect, the present application provides a sleep environment adaptive adjustment method, including:

acquiring a first environment state and a current sleep state of a user in a plurality of sleep states, wherein the plurality of sleep states are sleep states with different grades;

inputting a current sleep state and a first environment state into a current value network model, and acquiring a first environment adjustment parameter output by the current value network model, wherein the current value network model is a neural network model obtained by training through reinforcement learning;

and adjusting the first environment state to a second environment state based on the first environment adjustment parameter, wherein the second environment state is used for adjusting the sleep state of the user from the current sleep state to the target sleep state, and the current sleep state and the target sleep state are different in grade.

Optionally, after obtaining the first environment adjustment parameter output by the current value network model, the method further includes:

obtaining an expected value output by a current value network model, wherein the expected value is used for expressing the success probability of adjusting the sleep state of the user from the current sleep state to the target sleep state through a first environment adjustment parameter;

and saving the current sleep state, the first environment adjustment parameter, the expected value and the actual sleep state as a record in a storage memory unit, wherein the actual sleep state is the sleep state to which the sleep state of the user is actually adjusted from the current sleep state.

Optionally, after adjusting the first environmental state to the second environmental state based on the first environmental adjustment parameter, the method further includes:

under the condition that the number of records in the storage memory unit reaches a target threshold value, performing multiple updating operations on the values of the parameters in the current value network model by using a target value network model, wherein the target value network model and the current value network model have the same structure, and the values of the parameters in the target value network model are the same as the values of the parameters in the current value network model before the multiple updating operations are performed;

and after the values of the parameters in the current value network model are subjected to multiple updating operations, transmitting the values of the parameters in the current value network model to the parameters in the target value network model.

Optionally, performing a plurality of update operations on the values of the parameters in the current value network model using the target value network model comprises performing each update operation as follows:

selecting a plurality of records from a storage memory unit;

acquiring a first expected value of the current value network model and a second expected value of the target value network model, wherein the first expected value is the maximum value of expected values output by the current value network model when a plurality of records are input, and the second expected value is the maximum value of expected values output by the target value network model when the plurality of records are input;

and updating the parameters in the current value network model by using the first expected value and the second expected value.

Optionally, the method further comprises determining whether the number of records in the storage memory unit reaches the target threshold by:

and counting whether the number of records stored in the storage memory unit after the step of transferring the value of the parameter in the current value network model to the parameter in the target value network model is executed in the previous time reaches a target threshold value.

Optionally, after passing the values of the parameters in the current-value network model to the parameters in the target-value network model, the method further comprises:

and continuously determining the environment adjustment parameters corresponding to the sleep state and the environment state of the user by using the current value network model.

Optionally, adjusting the first environmental state to the second environmental state based on the first environmental adjustment parameter includes:

and adjusting the temperature and the humidity according to the first environment adjustment parameter, wherein the first environment state comprises the temperature and the humidity of the household appliance before adjustment, and the second environment state comprises the temperature and the humidity of the household appliance after adjustment.

In a second aspect, the present application provides a sleep environment adaptive adjustment apparatus, including:

the system comprises a state acquisition module, a state acquisition module and a sleep state acquisition module, wherein the state acquisition module is used for acquiring a first environment state and the current sleep state of a user in a plurality of sleep states, and the sleep states are sleep states with different grades;

the state transmission module is used for inputting the current sleep state and the first environment state into a current value network model and acquiring a first environment adjustment parameter output by the current value network model, wherein the current value network model is a neural network model obtained by training through reinforcement learning;

and the state adjusting module is used for adjusting the first environment state to a second environment state based on the first environment adjusting parameter, wherein the second environment state is used for adjusting the sleep state of the user from the current sleep state to the target sleep state, and the current sleep state and the target sleep state are different in grade.

In another aspect, the present application provides an electronic device including a memory, a processor, and a program stored in the memory and executable on the processor, wherein the processor implements the steps of the method when executing the program.

In another aspect, the present application provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method described above.

Compared with the prior art, the technical scheme provided by the embodiment of the application has the following advantages:

according to the method provided by the embodiment of the application, the real-time sleep state information of a person and the real-time indoor temperature and humidity information are trained through the neural network model to feed back and adjust the indoor temperature and humidity, so that the problem of how to adjust the environmental state in real time to adapt to the sleep state of the person to improve the sleep quality is solved, the indoor environment is changed towards the direction which is helpful for the sleep of the person, and the sleep quality of the person is improved.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.

Fig. 1 is a schematic flowchart of a sleep environment adaptive adjustment method according to an embodiment of the present application;

fig. 2 is a schematic diagram of a temperature and humidity adaptive control strategy provided in an embodiment of the present application;

FIG. 3 is a flow chart of a reinforcement learning algorithm tuning provided in an embodiment of the present application;

FIG. 4 is a schematic diagram of a neural network provided in an embodiment of the present application;

fig. 5 is a block diagram of a sleep environment adaptive adjustment apparatus according to an embodiment of the present application;

fig. 6 is a schematic view of an internal structure of an electronic device according to an embodiment of the present disclosure.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Fig. 1 is a flowchart of a sleep environment adaptive adjustment method provided in an embodiment of the present application, and as shown in fig. 1, the method includes:

s1, acquiring the current sleep state of the user in the first environment state and a plurality of sleep states, wherein the plurality of sleep states are sleep states with different levels.

Specifically, a plurality of pressure sensors, vital sign sensors and the like are arranged in a mattress of the intelligent bedroom, the sleeping state of the user can be acquired according to detection information of the sensors and a corresponding algorithm, and for example, when the physical sign information is a body surface humidity parameter, the body surface humidity information of the user can be directly acquired through the humidity sensors; for another example, when the physical sign information is a sleeping posture parameter of the user, the body of the user can be sensed through the pressure sensor arranged on the mattress, the coverage range of the body of the user on the mattress is determined according to the sensed position of the pressure sensor of the body of the user, and the sleeping posture of the user is determined according to the corresponding relation between the preset range and the sleeping posture.

The acquisition century of the physical sign information can be determined according to specific situations. For example, a sleep period (e.g., 10 pm to 7 am) may be set, and the physical sign information is acquired in the sleep period in a periodic detection manner, the period may be set to a fixed time, or different acquisition frequencies may be set according to different time periods, for example, a user sleeps less in the first midnight, and the acquisition frequency may be increased to change the bedroom environment more quickly so that the user enters a deep sleep state more quickly; by the time the user is generally asleep in the latter half of the night, the acquisition frequency may be reduced to reduce energy consumption. The physical sign information of the user can be acquired in real time, and the bedroom environment can be adjusted more accurately and quickly.

After the physical sign information of the user is determined, obtaining physical sign parameters (for example, one or more of a body surface humidity parameter, a breathing depth parameter, a breathing frequency parameter, and a sleeping posture parameter) of the user from the physical sign information, comparing the obtained physical sign parameters with pre-stored corresponding pre-set physical sign parameters, and determining a sleeping state of the user according to a comparison result and a pre-set strategy, where the pre-set strategy is a method that a technician in the art can perform adaptive setting according to the physical sign parameters included in the physical sign information, and is used for determining the sleeping state of the user according to the comparison result, and this embodiment does not specifically limit this method. For example, the body surface humidity parameter during sleep may be set to 60, the breathing rate parameter may be set to 80 times/minute, the preset body surface humidity parameter may be set to 20, and the preset breathing rate parameter may be set to 60 times/minute.

When the user adjusts the sleeping posture for multiple times in a short time, the sleeping quality of the user can be judged to be low or the user does not fall asleep; if the user does not adjust the sleeping posture in a short time, the user can be judged to be asleep, or the user can lie in a bed to watch videos and the like, and whether the user is in a sleep state or not can be judged according to parameters such as the breathing frequency, the breathing depth and the like of the user.

In the embodiments of the present application, the sleep state is divided into three states of no sleep, light sleep and deep sleep, and the purpose of the present application is to promote the user to enter the deep sleep state while sleeping.

And S2, inputting the current sleep state and the first environment state into a current value network model, and acquiring a first environment adjustment parameter output by the current value network model, wherein the current value network model is a neural network model obtained by training through reinforcement learning.

Specifically, different sleep states may be caused by different environmental factors, so that the environmental factors that need to be adjusted for different sleep states may be different, and further, the adjustment means for different environmental factors may also be different, and the correspondence between the sleep states and the environmental factors may be obtained and set by a person skilled in the art through means of investigation, research, experiment, and the like. Therefore, in the present embodiment, the temperature and humidity in the bedroom will be described as environmental factors affecting the quality of sleep.

Fig. 2 is a schematic diagram of a temperature and humidity adaptive control strategy according to an embodiment of the present disclosure, as shown in fig. 2, a method for adjusting a bedroom environment according to an evaluation function and a reward penalty strategy is used to change the temperature and humidity of the bedroom, and the influence of different temperature and humidity levels on the sleep of a user is detected by adjusting the temperature and humidity levels; according to the evaluation function, the user is rewarded when the user is helpful to sleep and punished when the user is not helpful to sleep, so that different temperatures and humidities are continuously adjusted towards the direction which is helpful to sleep, and finally the optimal temperature and humidity control strategy which is suitable for the sleep condition of a specific person is learned.

Regarding the evaluation function, a function model of a neural network model including reinforcement learning (DQN) is selected in the present application, a specific model algorithm flowchart is shown in fig. 3, a DQN algorithm structure mainly includes two deep neural networks, a memory storage table and an error function, a sleep state of a person is taken as a reward in the reinforcement learning algorithm, temperature and humidity control is taken as actions, and learning adaptive control is performed to improve the sleep quality of the person. As shown in fig. 4, the two deep neural networks may be a three-layer (or other layers) Deep Neural Network (DNN) model, and the input layers are the sleep states of the human: the sleep is not taken (reward is 0), the light sleep (reward is small) and the deep sleep (reward is large), the output layer is expected values (Q values) corresponding to different actions through the processing of the hidden layer, the actions are temperature and humidity, the actions are regarded as discrete values, and one Q value can be output every time one sleep state is input, wherein the higher the Q value is, the more the corresponding action is expected. Each row in the memory storage table stores the current sleep state, the action, the reward and the next sleep state, the size of the memory storage table is set, and when the number of the stored rows exceeds the size of the memory table, the oldest stored row is replaced in sequence.

For example, assuming that the current air conditioner temperature is 20 degrees, the user's sleep state is light sleep, inputting the user's sleep state into the DNN model, obtaining the parameter values in the current DNN model, outputting the corresponding Q value through the training function in the model, at this time, finding that the Q value is not high, then correspondingly adjusting the air conditioner temperature to 22 degrees, detecting the user's sleep state, finding that the user's sleep state is better, increasing the reward in the evaluation function, inputting the sleep state data at this time into the DNN model, outputting the corresponding Q value through the training function in the model, at this time, the Q value is higher than the Q value when the air conditioner temperature is 20 degrees, explaining that the direction of temperature adjustment is right, and then continuing adjustment in the direction of temperature increase.

As shown in fig. 3, firstly, parameters in two DNN network models are initialized, and current action (temperature and humidity) and sleep state data of a current user are obtained as input of the current network model; according to the values output by the current value network model, selecting a proper value as an expected value (a predicted Q value in FIG. 3) by a Q-learning algorithm selection method, wherein the approximate selection rule of the algorithm is as follows: after the sleep state data of the user is input, different actions (temperature and humidity adjustment) are performed in the algorithm, each action can generate an output value, the Q-learning algorithm has a probability of 90%, the value with the maximum output value is selected as a predicted Q value to be output, and the action corresponding to the predicted Q value is selected as the adjusted action corresponding to the current sleep state correspondingly.

For example, in this embodiment, the temperature is taken as an example, assuming that the temperature of the current air conditioner is 20 degrees, the sleep condition of the user is light sleep, and therefore the reward is small, the sleep state of the user is input into the current value network model, the model predicts a plurality of adjustment conditions, for example, the temperature is adjusted to 19 degrees, the temperature is adjusted to 21 degrees, the temperature is adjusted to 22 degrees, and the like, each adjustment condition corresponds to one predicted output value, the algorithm selects the largest one of the output values as an expected value (predicted Q value) to output from the model according to 90% probability, and then finds that the output value corresponding to 22 degrees is higher than 19 degrees and 21 degrees, and selects the expected value corresponding to 22 degrees to output. And then controlling the air conditioner to perform the action of adjusting to 22 degrees, acquiring the sleep state of the user after the temperature is adjusted, and storing the current light sleep state of the user, 22 degrees, the reward obtained due to the change of the sleep state of the user after the temperature is adjusted and the actual sleep state of the user after the temperature is adjusted into a memory storage table as a line.

And repeating the steps, inputting the sleep state of the user after the temperature is adjusted to 22 degrees into the current value network model, continuously predicting a plurality of adjustment conditions by the model, finally obtaining the maximum output value corresponding to 24 degrees, taking the output value corresponding to 24 degrees as a predicted Q value, taking the Q value corresponding to 24 degrees as the Q value corresponding to 22 degrees, taking the reward corresponding to 24 degrees as the reward corresponding to 22 degrees at the moment, adjusting the air conditioner to 24 degrees, obtaining the actual sleep state of the user after the temperature is adjusted, and storing the current sleep state of the user, the rewards corresponding to 24 degrees and 22 degrees and the actual sleep state corresponding to 24 degrees into a memory storage table as a line. The above steps are repeated a predetermined number of times, which is set to 2000 times in the present embodiment.

If the optimal adjustment temperature is learned before the number of times of repeating the steps is 2000 times, the learning is stopped, for example, when the number of times of repeating the steps is 1600 times, the Q value obtained by 1599 th time is found to be larger than Q values obtained by 1598 times and 1600 times, the temperature adjustment of 1599 times is the largest degree of improvement of the sleep quality of the user, the sleep quality of the user is reduced when the temperature is changed, the action (temperature) corresponding to the maximum Q value, such as 25.8 degrees, is selected as the most comfortable temperature for the sleep of the user, and the humidity adjustment method is the same as the temperature.

And S3, adjusting the first environment state to a second environment state based on the first environment adjustment parameter, wherein the second environment state is used for adjusting the sleep state of the user from the current sleep state to the target sleep state, and the current sleep state and the target sleep state are different in level.

Specifically, when the number of times reaches 2000 times, 2000 rows of data are stored in the memory storage table, and data samples with a set number of rows are randomly selected from the memory storage table, 50 rows are taken in this embodiment and are respectively input into the current value network model and the target value network model, initial parameters in the two models are consistent, the current value network model is used for predicting actions, and the target value network model is used as a real action label. The method comprises the steps of inputting the current sleep state and the predicted execution action of a user in 50 rows into a current value network model, inputting the actual sleep state of the user after the predicted action is executed in 50 rows into a target value network model, inputting the output values into an error function, calculating the difference between the actual maximum Q value and the predicted maximum Q value, and updating the parameter by taking the difference as the error of the current value network model.

And repeating the random selection of 50 rows of data from the 2000 rows of data, and performing the above operation, namely repeating the updating and setting times of the parameters of the current-value network model, wherein the updating and setting times are set to 30 times (or 50 times, and the like) in the embodiment, and the setting can also be performed according to the actual situation, wherein after the parameters of the current-value network model are repeatedly updated for 30 times, the training result of the current-value network model is basically stable, at this time, the parameters of the current-value network model are assigned to the target-value network model to replace the parameters of the target-value network model, and. And judging whether the current value network model selects the optimal action to enable the user to achieve the final sleep effect, if so, stopping training, otherwise, continuously acquiring 2000 rows of data in the embodiment for training according to the current parameters, and selecting 50 rows for parameter correction until the optimal effect is achieved.

Because different sizes of temperature and humidity are needed to be controlled to study the influence on the sleep of people during training, the adjustment range of the temperature and the humidity is large during initial training, so that the influence on the sleep of people is large. The adjustment strategy can be learned by adopting experiments in advance, the approximate temperature adjustment range is determined, and after the user arrives at home, the self-adaptive learning is carried out on the basis of the original learned strategy.

If a plurality of persons exist in the room, the temperature and humidity corresponding to the optimal sleep state of each person may be different, but the difference is not too large, at this time, the optimal temperature and humidity of each person can be obtained through the evaluation function model, and then the average value is taken to adjust; the sleep states of a plurality of people can be input into the evaluation function model for training, and the finally obtained optimal value is the optimal temperature and humidity value integrated by the plurality of people.

By the method in the embodiment, the real-time sleep state information of the person and the indoor real-time temperature and humidity information are trained through the neural network model to feed back and adjust the indoor temperature and humidity, so that the method can adapt to the sleep of the person more, and the sleep quality of the person is improved.

Fig. 5 is a block diagram of a sleep environment adaptive adjustment apparatus according to an embodiment of the present application, and as shown in fig. 5, the apparatus includes:

a state obtaining module 51, configured to obtain a first environment state and a current sleep state of a user in multiple sleep states, where the multiple sleep states are sleep states with different levels;

the state transmission module 52 is configured to input the current sleep state and the first environment state into a current value network model, and acquire a first environment adjustment parameter output by the current value network model, where the current value network model is a neural network model obtained by training through reinforcement learning;

and a state adjusting module 53, configured to adjust the first environment state to a second environment state based on the first environment adjustment parameter, where the second environment state is used to adjust the sleep state of the user from the current sleep state to the target sleep state, and the current sleep state and the target sleep state are in different levels.

FIG. 6 is a diagram illustrating an internal architecture of an electronic device in one embodiment. As shown in fig. 6, the electronic apparatus includes a processor, a memory, a network interface, an input device, and a display screen connected through a system bus. Wherein the memory includes a non-volatile storage medium and an internal memory. The non-volatile storage medium of the electronic device stores an operating system and also stores a program, and the program can enable the processor to realize the sleep environment adaptive adjustment method when being executed by the processor. The internal memory may also have a program stored therein, which when executed by the processor, causes the processor to perform the sleep environment adaptive adjustment method. The display screen of the electronic device can be a liquid crystal display screen or an electronic ink display screen, and the input device of the electronic device can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the electronic device, an external keyboard, a touch pad or a mouse, and the like.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, apparatus (device), or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (devices) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The foregoing are merely exemplary embodiments of the present invention, which enable those skilled in the art to understand or practice the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A sleep environment adaptive adjustment method is characterized by comprising the following steps:

acquiring a first environment state and a current sleep state of a user in a plurality of sleep states, wherein the sleep states are sleep states with different grades;

inputting the current sleep state and the first environment state into a current value network model, and acquiring a first environment adjustment parameter output by the current value network model, wherein the current value network model is a neural network model obtained by training through reinforcement learning;

and adjusting the first environment state to a second environment state based on the first environment adjustment parameter, wherein the second environment state is used for adjusting the sleep state of the user from the current sleep state to a target sleep state, and the current sleep state and the target sleep state are different in grade.

2. The method of claim 1, wherein after obtaining the current value of the first environmental adjustment parameter output by the network model, the method further comprises:

obtaining an expected value output by the current value network model, wherein the expected value is used for representing the success probability of adjusting the sleep state of the user from the current sleep state to the target sleep state through the first environment adjustment parameter;

and saving the current sleep state, the first environment adjustment parameter, the expected value and an actual sleep state as a record in a storage memory unit, wherein the actual sleep state is a sleep state to which the sleep state of the user is actually adjusted from the current sleep state.

3. The method of claim 2, wherein after adjusting the first environmental state to a second environmental state based on the first environmental adjustment parameter, the method further comprises:

performing a plurality of updating operations on the value of the parameter in the current value network model by using a target value network model in the case that the number of records in the storage memory unit reaches a target threshold value, wherein the target value network model and the current value network model have the same structure, and the value of the parameter in the target value network model is the same as the value of the parameter in the current value network model before performing the plurality of updating operations;

and after the values of the parameters in the current value network model are subjected to the plurality of updating operations, transmitting the values of the parameters in the current value network model to the parameters in the target value network model.

4. The method of claim 3, wherein performing a plurality of update operations on the values of the parameters in the current value network model using a target value network model comprises performing each of the update operations as follows:

selecting a plurality of records from the storage memory unit;

acquiring a first expected value of the current value network model and a second expected value of the target value network model, wherein the first expected value is the maximum value of expected values output by the current value network model when the plurality of records are input, and the second expected value is the maximum value of expected values output by the target value network model when the plurality of records are input;

5. The method of claim 3, further comprising determining whether the number of records in the storage memory unit reaches the target threshold by:

and counting whether the number of records saved in the storage memory unit after the step of transferring the value of the parameter in the current value network model to the parameter in the target value network model is executed in the previous time reaches the target threshold value.

6. The method of claim 3, wherein after passing the values of the parameters in the current-value network model to the parameters in the target-value network model, the method further comprises:

and continuously using the current value network model to determine the environment adjustment parameters corresponding to the sleep state and the environment state of the user.

7. The method of claim 1, wherein adjusting the first environmental state to a second environmental state based on the first environmental adjustment parameter comprises:

8. A sleep environment adaptive adjustment device is characterized by comprising:

a state adjustment module, configured to adjust the first environment state to a second environment state based on the first environment adjustment parameter, where the second environment state is used to adjust the sleep state of the user from the current sleep state to a target sleep state, and the current sleep state and the target sleep state are in different levels.

9. An electronic device comprising a memory, a processor and a program stored on the memory and executable on the processor, wherein the steps of the method of any one of claims 1 to 7 are implemented when the program is executed by the processor.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.