CN113077870A

CN113077870A - Diet plan decision method and device, computer equipment and storage medium

Info

Publication number: CN113077870A
Application number: CN202110476903.7A
Authority: CN
Inventors: 袁东昇; 阮晓雯; 肖京
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2021-04-29
Filing date: 2021-04-29
Publication date: 2021-07-06

Abstract

The application relates to the field of big data processing, and discloses a diet plan decision method, a diet plan decision device, a diet plan decision computer device and a storage medium, wherein the diet plan decision method comprises the following steps: acquiring current physiological state data and target physiological state data of a user; taking the current physiological state data and the target physiological state data as the input of a reinforcement learning sequential decision model, and outputting a plurality of execution plans and reward values of the execution plans based on the reinforcement learning sequential decision model; and selecting an execution plan meeting preset conditions as a target execution plan according to the reward value, and outputting the execution action and physiological state data of each day contained in the target execution plan. By the aid of the method and the device, long-distance planned diet data can be output, and planning efficiency of diet plans is improved.

Description

Diet plan decision method and device, computer equipment and storage medium

Technical Field

The present application relates to the field of big data, and in particular, to a diet plan decision method, an apparatus, a computer device, and a storage medium.

Background

Health is a necessary requirement for promoting the overall development of people, the requirement of people on healthy diet increases year by year, the current diet health management tools can only identify and analyze the calorie of food once and give notice of the calorie of food to users, and the inventor realizes that the current diet health management tools can not realize continuous diet suggestion, and the automation degree and the efficiency of the diet suggestion are low.

Disclosure of Invention

The present application mainly aims to provide a diet plan decision method, a diet plan decision device, a computer device and a storage medium, and aims to solve the problem of low automation degree and efficiency of the current diet proposal.

In order to achieve the above object, the present application provides a method for deciding a diet plan, comprising:

acquiring current physiological state data and target physiological state data of a user;

taking the current physiological state data and the target physiological state data as the input of a reinforcement learning sequential decision model, and outputting a plurality of execution plans and reward values of the execution plans based on the reinforcement learning sequential decision model;

and selecting an execution plan meeting preset conditions as a target execution plan according to the reward value, and outputting the execution action and physiological state data of each day contained in the target execution plan.

Further, the sequential decision model based on reinforcement learning outputs several execution plans and reward values of the execution plans, including:

taking the current physiological state data of the user as the physiological state data of the previous day;

acquiring a plurality of execution actions from a preset database, and sequentially selecting one execution action as the execution action of the previous day;

sequentially calculating physiological state data of the previous day to execute the executing action of the previous day to obtain the physiological state data of the next day; obtaining result physiological state data until executing the executing action of the last day of the executing period;

combining the associated execution actions of each day to generate a plurality of execution plans and result physiological state data corresponding to each execution plan;

comparing the resulting physiological state data with the target physiological state data to determine reward values for different item execution plans.

Further, the obtaining a plurality of execution actions from a preset database, and before sequentially selecting one execution action as the execution action of the previous day, further includes:

acquiring an execution cycle of a pre-output execution plan, and determining execution difficulty according to the execution cycle;

and acquiring a plurality of execution actions meeting the execution difficulty from a preset database according to the execution difficulty.

Further, after selecting an execution plan meeting a preset condition as a target execution plan according to the reward value, the method further includes:

matching target users with the same target execution plan for the users;

and establishing an incidence relation between the user and the target user, and pushing progress information and reference data information of a target execution plan of the target user to the user based on the incidence relation.

Further, the performing action includes diet information, the diet information including food category; the executing action of each day included in the outputting of the target execution plan comprises:

acquiring geographical features and dietary habits of a user;

matching target food types according to the regional characteristics and the eating habits;

replacing the food category in the action-performing diet information with the target food category.

Further, the execution action comprises diet information, and the diet information comprises food types; the executing action of each day included in the outputting of the target execution plan comprises:

acquiring the type of food to be replaced selected by a user;

matching a target food category with equivalent calorie information according to the calorie information of the food category to be replaced;

replacing the food category to be replaced in the action-performing diet information with the target food category.

Further, after selecting an execution plan meeting preset conditions as a target execution plan according to the reward value and outputting the execution action and physiological state data of each day included in the target execution plan, the method further includes:

and sending the target execution plan to a preset supervisor.

The present application also provides a diet plan decision device, comprising:

a data acquisition module: the physiological state data acquisition module is used for acquiring current physiological state data and target physiological state data of a user;

a plan generation module: the system comprises a plurality of items of execution plans and reward values of the execution plans, wherein the items of execution plans are output based on a sequential decision model of reinforcement learning by taking the current physiological state data and target physiological state data as input of the sequential decision model of the reinforcement learning;

a plan output module: and the execution plan meeting preset conditions is selected as a target execution plan according to the reward value, and the execution action and physiological state data of each day included in the target execution plan are output.

The present application further provides a computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the steps of the method for dietary plan decision-making as described in any one of the above when executing the computer program.

The present application also provides a computer readable storage medium having stored thereon a computer program which, when being executed by a processor, carries out the steps of the method of deciding a diet plan according to any one of the above.

The application example provides a decision method of a long-distance planned diet plan, under a diet management scene, current physiological state data and target physiological state data of a user are obtained, the current physiological state data and the target physiological state data are used as input of a reinforcement learning sequential decision model, the reinforcement learning sequential decision model can calculate physiological state data of the previous day and physiological state data of the next day of the user under different execution actions, the physiological state data after execution actions of each day in a preset period are calculated, different execution actions of each day in the preset period are combined to be used as an execution plan, the physiological state data of the last day of the execution plan is used as result physiological state data of the execution plan, and the result physiological state data is compared with the target physiological state data to obtain reward values of the execution plans, and selecting a target execution plan meeting the conditions according to the reward value, outputting execution actions and physiological state data contained in the target execution plan, and finishing daily diet collocation by the user according to the execution actions, thereby outputting diet data of long-term plans, improving the planning efficiency of diet plans and helping the user achieve the aim of achieving target body state data in an execution period.

Drawings

FIG. 1 is a schematic flow chart diagram illustrating an embodiment of a dietary plan decision method according to the present application;

FIG. 2 is a schematic flow chart diagram illustrating another embodiment of a dietary plan decision method of the present application;

FIG. 3 is a schematic diagram of an embodiment of a decision device for a dietary plan of the present application;

FIG. 4 is a block diagram illustrating a computer device according to an embodiment of the present invention.

The implementation, functional features and advantages of the objectives of the present application will be further explained with reference to the accompanying drawings.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

Referring to fig. 1, an embodiment of the present application provides a dietary plan decision method, which includes steps S10-S30, and the steps of the dietary plan decision method are described in detail as follows.

And S10, acquiring the current physiological state data and the target physiological state data of the user.

The embodiment is applied to a diet management scene, in the diet management scene, a user needs to develop from the current body state to a target body state, the body state is represented by a series of quantifiable data, including data such as height, weight, sex, age, vital capacity, bounce, three-dimensional circumference, body fat ratio and the like, and the body state is defined by a plurality of different physiological state data; in one embodiment, the user may input his/her current body state information, or obtain the current body state information of the user through a specific measuring tool, and define the current body state information as the current physiological state data, so as to obtain the current physiological state data of the user, for example, by obtaining the height, weight, lung capacity, bounce, circumference, body fat rate, and the like of the user measured by the measuring tool, and then generate the current physiological state data of the user. In addition, the physical state that the user wants to achieve needs to be determined, that is, the target physiological state data of the user needs to be acquired, in one embodiment, the target physiological state data of the user can be input by the user, and the user inputs the target physiological state data of the user aiming at each physiological state data, so that the target physiological state data of the user is acquired; or the target physiological state data of the user can be obtained by obtaining physiological state data meeting the physiological health standard based on some current physiological state data of the user and taking the physiological state data as the target physiological state data.

And S20, taking the current physiological state data and the target physiological state data as the input of a reinforcement learning sequential decision model, and outputting a plurality of execution plans and reward values of the execution plans based on the reinforcement learning sequential decision model.

In this embodiment, after the current physiological state data and the target physiological state data of the user are obtained, an execution plan for achieving the target physiological state data needs to be generated for the user according to the current physiological state data and the target physiological state data.

The sequential decision model of reinforcement learning is constructed by the following steps:

acquiring physiological index state quantities according to the model configuration information, and acquiring physiological index state quantities of different users as first training data;

acquiring the executing action state quantity according to the model configuration information, and acquiring a combination of different executing action state quantities as second training data;

substituting the first training data and the second training data into a human body metabolism formula for model training, calculating human body metabolism results under different execution actions, and returning a reward value according to the human body metabolism results;

and performing iterative optimization based on each reward value, so that the model optimally learns towards the direction of the maximum reward value, and a sequential decision model for reinforcement learning is generated.

Acquiring physiological index state quantity and execution action state quantity according to model configuration information, wherein the model configuration information comprises definitions and quantization standards of different physiological indexes and definitions and quantization standards of different execution actions, then defining the physiological index state quantity and the execution action state quantity according to the configuration information, carrying out numerical quantization on each physiological index state, carrying out numerical quantization on each execution action, acquiring the physiological index state quantity of different users as first training data after defining the above elements, acquiring a combination of the different execution action state quantities as second training data, substituting the first training data and the second training data into a human body metabolic formula for model training, simulating the interaction process of a human body and an environment for a plurality of times, and calculating the human body metabolic result under different execution actions through the model each time of human body interaction, and returning an incentive value according to the human body metabolism result, selecting an optimal incentive value and executing, and performing iterative optimization based on each incentive value, so that the model optimally learns towards the direction of the maximum incentive value, and a sequential decision model for reinforcement learning is generated.

Specifically, the reinforcement learning sequential decision model firstly obtains current physiological state data of a user, then obtains a plurality of execution actions of each day from a preset database, the execution actions of each day are different, then calculates physiological state data of the next day obtained after the current physiological state data of the user executes different execution plans, then records the current physiological state data and the physiological state data of the next day, then takes the physiological state data of the next day as new current physiological state data, then obtains the execution actions of each day, calculates physiological state data of the next day, and so on, after the execution actions of a preset period are executed from the current state data, the preset period is a preset planned number of days for changing from the current state data to the target state data, different result physiological state data can be obtained, different execution actions of each day in a preset period are combined to be used as an execution plan, and then the reward value of the execution plan is determined according to the difference value of the result physiological state data and the target physiological state data of each execution plan. The smaller the difference between the outcome physiological state data and the target physiological state data, the larger the reward value; the larger the difference, the smaller the prize value.

For example, the physiological status data of the user on the first day is A, the acquired execution action is D1, the physiological status data of the second day obtained after the execution of D1 is A1, the execution action D2 is acquired, the physiological status data of the second day obtained after the execution of D2 is A2, all execution actions of a preset period are sequentially executed until the execution action Dx of the last day of the preset period is executed, the result physiological status data Ax is obtained, an execution plan is composed of D1, D2-Dx, and the result physiological status data Ax is compared with the target physiological status data to determine the reward value of the execution plan. The smaller the difference, the greater the prize value; the larger the difference, the smaller the prize value.

S30, selecting an execution plan meeting preset conditions as a target execution plan according to the reward value, and outputting the execution action and physiological state data of each day contained in the target execution plan.

In this embodiment, after obtaining a plurality of execution plans, an execution plan meeting a preset condition is selected according to the reward value, the value of the reward value determines a difference value between result physiological state data of different execution plans and target physiological state data, and then the execution plan with the reward value meeting the preset condition is selected as a target execution plan. In one embodiment, the execution plan with the maximum reward value is selected as a target execution plan, and then execution actions contained in the target execution plan are output, each execution plan contains execution actions of each day in an execution cycle, and the execution actions comprise diet information and diet execution information, and also output the physiological status data of the user every day, and after outputting the execution action and physiological status data of every day included in the target execution plan, the user can determine the execution action of every day according to the target execution plan, namely determining diet information and diet execution information of each day, wherein the diet information comprises food materials, components and practices, the diet execution information comprises diet time points and diet duration, and the user can complete daily diet collocation according to the execution action, so that the aim of achieving target body state data in the execution cycle is fulfilled.

The embodiment provides a decision method for a long-distance planned diet plan, which includes acquiring current physiological state data and target physiological state data of a user in a diet management scene, using the current physiological state data and the target physiological state data as input of a reinforcement learning sequential decision model, calculating physiological state data of a previous day and physiological state data of a next day of the user under different execution actions by the reinforcement learning sequential decision model, calculating post-execution physiological state data of each day in a preset period, combining different execution actions of each day in the preset period to serve as an execution plan, using the last day physiological state data of the execution plan as result physiological state data of the execution plan, and comparing the result physiological state data with the target physiological state data to obtain reward values of the execution plans, and selecting a target execution plan meeting the conditions according to the reward value, outputting execution actions and physiological state data contained in the target execution plan, and finishing daily diet collocation by the user according to the execution actions, thereby outputting diet data of long-term plans, improving the planning efficiency of diet plans and helping the user achieve the aim of achieving target body state data in an execution period.

In one embodiment, as shown in fig. 2, the outputting of several execution plans and reward values of the execution plans based on the reinforcement learning sequential decision model includes:

s21: taking the current physiological state data of the user as the physiological state data of the previous day;

s22: acquiring a plurality of execution actions from a preset database, and sequentially selecting one execution action as the execution action of the previous day;

s23: sequentially calculating physiological state data of the previous day to execute the executing action of the previous day to obtain the physiological state data of the next day; obtaining result physiological state data until executing the executing action of the last day of the executing period;

s24: combining the associated execution actions of each day to generate a plurality of execution plans and result physiological state data corresponding to each execution plan;

s25: comparing the resulting physiological state data with the target physiological state data to determine reward values for different item execution plans.

In this embodiment, the reinforcement learning sequential decision model first obtains current physiological state data of a user, and uses the current physiological state data of the user as physiological state data of a previous day, where the previous day is a day before a next day, specifically, the current physiological state data of the user is used as physiological state data of a first day, and obtains a plurality of execution actions from a preset database, where the plurality of execution actions are different, and one execution action is sequentially selected as an execution action of the previous day, and the physiological state data of the previous day is sequentially calculated to execute the execution action of the previous day, so as to obtain physiological state data of the next day; and obtaining the result physiological state data until the execution action of the last day of the execution period is executed. The execution plan is started from the first day, so that an execution action is selected as the execution action of the first day, then the physiological state data of the first day is calculated to execute the execution action of the first day to obtain the physiological state data of the second day, then an execution action is selected as the execution action of the second day, the physiological state data of the second day is calculated to execute the execution action of the second day to obtain the physiological state data of the third day; repeating the steps until the execution action of the last day of the execution cycle is executed, and obtaining result physiological state data; then, the execution actions of each day associated with each other are combined to generate a plurality of execution plans and corresponding result physiological state data, the execution actions of each day associated with each other refer to the combination of different execution actions, for example, the execution actions include D1a, D1b and D1c, after the execution of D1a on the first day, the execution of D1c on the second day, then D1a and D1c are the execution actions associated with each other; if D1b was executed on the first day and D1c was executed on the second day, D1b and D1c are associated execution actions.

Wherein, one embodiment of calculating the change after the physiological state data performs the action is as follows:

the total daily energy consumption of human body is 0.95 per day basal metabolic consumption PAL;

wherein: basal daily metabolic energy expenditure BMR (kcal/m ^2/H) 24H body surface area;

PAL (physical activity level) is the total energy expenditure in 24h divided by the energy expended in 24h basal metabolism. Basal Metabolic Rate (BMR) refers to the basal metabolic energy consumed by the human body per square meter of body surface area per unit time. The unit is kcal/m 2/h. Further, the basal metabolic rate Men is 66.4730+ (13.7516x weight in kg) + (5.0033x weight in cm) - (6.7550x age in years); women's basal metabolic rate Women BMR 655.0955+ (9.5634x weight in kg) + (1.8496xheight in cm) - (4.6756x age in years); body surface area (m ^2) ═ 0.00659 × height (cm) +0.0126 × weight (kg) -0.1603; and finally: (energy intake of human body-energy consumption of human body)/conversion ratio is the change of the physiological state data (weight) of human body, and the conversion value of scientific measurement is 3889 calories/kg.

After obtaining each execution plan and the result physiological data thereof, comparing the result physiological state data with the target physiological state data to determine reward values of different execution plans, wherein when the difference value is smaller, the result physiological data is closer to the target physiological state data, and the reward value of the execution plan is larger; when the difference value is larger, the difference between the result physiological data and the target physiological state data is larger, the reward value of the execution plan is smaller, different execution actions are combined, the execution plan with the maximum reward value in the preset period can be obtained through continuous combination, and the result physiological state data of the execution plan trends to the target physiological state data.

In one embodiment, the obtaining a plurality of execution actions from a preset database, and before sequentially selecting one of the execution actions as the execution action of the previous day, further includes:

In this embodiment, in a specific application scenario, in order to quickly implement the body state to the target body state data, it is necessary to implement the body state to the target body state data in a shorter execution period, and at this time, it is necessary to improve the execution difficulty of the execution action, that is, the greater the variation value of the physiological state data of each day in the execution plan is from the average value, the higher the execution difficulty of the execution action of the day is, and the greater the execution difficulty of the execution plan is. Specifically, an execution cycle of a pre-output execution plan is obtained, the execution difficulty of execution actions is determined according to the execution cycle, and then a plurality of execution actions meeting the execution difficulty are obtained from a preset database according to the execution difficulty, so that the selected execution actions can meet the execution difficulty, the too large difference of the execution difficulties before and after each execution action in the execution plan is avoided, and the reasonability of the execution plan is improved. Further, the execution period may be continuously adjusted, but a safety value of the execution period is set, then reward values of several execution plans in different execution periods are calculated based on the reinforcement learning sequential decision model, and the execution plans in different execution periods are determined, so as to provide a time reference for achieving the target body state data for the user, for example, in a scene of a specific weight level, the user needs to reach a certain weight level within a specific time, so that the execution period of the execution plan may be obtained, and then the execution difficulty of the execution action is determined, so as to enable the body state data of the user to reach a certain weight level within a specific time.

In an embodiment, after the step S30 of selecting an execution plan meeting a preset condition as a target execution plan according to the bonus value, the method further includes:

matching target users with the same target execution plan for the users;

In this embodiment, after selecting a target execution plan meeting a preset condition according to the reward value for a user, matching the target user with the same target execution plan for the user in order to allow the users with the same target execution plan to refer to or stimulate each other, then establishing an association relationship between the user and the target user and pushing contact information, and establishing an association between the user and the target user, so that the user can know a target plan execution process of the target user, where the target user includes users who are already performing the target execution plan, pushing progress information and reference data information of the target execution plan of the target user to the user based on the association relationship, and the user can observe changes of body state data of the target user every day after the target user performs the same target execution plan, so as to know changes of body state data of the user in advance according to the reference data information of the target user, the user can more clearly know the implementation effect of the target execution plan.

In one embodiment, the performing action includes diet information, the diet information including a food category; the executing action of each day included in the outputting of the target execution plan comprises:

acquiring geographical features and dietary habits of a user;

In this embodiment, the execution action includes diet information, and the diet information includes food types, that is, the execution plan includes daily diet plan information, and when the target execution plan is output, the regional characteristics and the diet habits of the user are obtained in consideration of the difference in diet habits of the users in different regions, and the regional characteristics and the diet habits of the user can be matched through big data, and the diet habits of the users in different regions are determined based on the big data, and then the target food type is matched according to the regional characteristics and the diet habits, and the food type of the execution action in the target execution plan is replaced by the target food type, so that different food types can be matched for different users, the requirements of the users with different regional characteristics and diet habits are met, and the matching rationality of the diet data is improved. For example, when the lunch calorie content of the diet information in the execution action is 50K, the target food category matched according to the eating habits of the users in the a zone is food S1, and the target food category matched by the users in the B zone is food S2.

acquiring the type of food to be replaced selected by a user;

In the embodiment, the preference degrees of users with different regions or different eating habits for the diet are different, when outputting the target execution plan and outputting the execution actions of each day included in the target execution plan, if the user wants to replace some kind of food, the kind of food to be replaced selected by the user is obtained, then the kind of food to be replaced of the execution action of each day included in the target execution plan is obtained, then matching a target food category with equivalent calorie information according to the calorie information of the food category to be replaced, replacing the food category to be replaced in the diet information for executing the action with the target food category, wherein the food category in the diet information of the user can be replaced according to the preference of the user, so that the diet information in the target execution plan is customized more individually and meets the requirements of different users. Optionally, after a certain type of food to be replaced is selected, partial replacement or full replacement may be selected, where the partial replacement is to select a food type to be replaced included in a certain time period in the execution plan and replace the selected food type with a target type of food, and the full replacement is to replace the execution plan with all food types to be replaced as target types of food, so as to meet the dietary requirements of different users.

In one embodiment, after the step S30 selecting an execution plan meeting a preset condition as a target execution plan according to the reward value, and outputting the execution action and physiological status data of each day included in the target execution plan, the method further includes:

and sending the target execution plan to a preset supervisor.

In this embodiment, after obtaining the target execution plan of the user, in order to enable the user to better perform daily food intake according to the target execution plan, the target execution plan is sent to a preset supervisor, and the supervisor of the preset supervisor is notified to provide food for the user according to the target execution plan, so as to help the user to reach the target physiological state data in a preset period. In an application scenario such as a month-sitting scenario, after a target execution plan is determined according to different current physiological states and target physiological state data of each user, the target execution plan is sent to a preset supervisor, so that the preset supervisor can prepare for diet of the user every day according to the target execution plan, and diet rules and body state data of the user during the month-sitting are guaranteed.

Referring to fig. 3, the present application further provides a diet plan decision device, including:

the data acquisition module 10: the physiological state data acquisition module is used for acquiring current physiological state data and target physiological state data of a user;

the plan generation module 20: the system comprises a plurality of items of execution plans and reward values of the execution plans, wherein the items of execution plans are output based on a sequential decision model of reinforcement learning by taking the current physiological state data and target physiological state data as input of the sequential decision model of the reinforcement learning;

the plan output module 30: and the execution plan meeting preset conditions is selected as a target execution plan according to the reward value, and the execution action and physiological state data of each day included in the target execution plan are output.

As mentioned above, it is understood that the components of the dietary plan decision device proposed in the present application can implement the functions of any of the above dietary plan decision methods.

In one embodiment, the plan generation module 20 further performs:

In one embodiment, the apparatus further comprises a matching module for matching users with target users having the same target execution plan; and establishing an incidence relation between the user and the target user, and pushing progress information and reference data information of a target execution plan of the target user to the user based on the incidence relation.

In one embodiment, the plan output module 30 further performs:

acquiring geographical features and dietary habits of a user;

In one embodiment, the plan output module 30 further performs:

acquiring the type of food to be replaced selected by a user;

In one embodiment, the apparatus further comprises a supervisor module for sending the target execution plan to a preset supervisor.

Referring to fig. 4, a computer device, which may be a mobile terminal and whose internal structure may be as shown in fig. 4, is also provided in the embodiment of the present application. The computer equipment comprises a processor, a memory, a network interface, a display device and an input device which are connected through a system bus. Wherein, the network interface of the computer equipment is used for communicating with an external terminal through network connection. The input means of the computer device is for receiving input from a user. The computer designed processor is used to provide computational and control capabilities. The memory of the computer device includes a storage medium. The storage medium stores an operating system, a computer program, and a database. The database of the computer device is used for storing data. The computer program is executed by a processor to implement a method of dietary plan decision-making.

The processor executes the diet plan decision method, which comprises the following steps: acquiring current physiological state data and target physiological state data of a user; taking the current physiological state data and the target physiological state data as the input of a reinforcement learning sequential decision model, and outputting a plurality of execution plans and reward values of the execution plans based on the reinforcement learning sequential decision model; and selecting a target execution plan meeting preset conditions according to the reward value, and outputting the execution action and physiological state data of each day contained in the target execution plan.

The computer equipment provides a decision-making method of a long-distance planned diet plan, under a diet management scene, current physiological state data and target physiological state data of a user are obtained, the current physiological state data and the target physiological state data are used as input of a reinforcement learning sequential decision model, the reinforcement learning sequential decision model can calculate physiological state data of the previous day and physiological state data of the next day of the user under different execution actions, the physiological state data after execution actions of each day in a preset period are calculated, different execution actions of each day in the preset period are combined to be used as an execution plan, the physiological state data of the last day of the execution plan is used as result physiological state data of the execution plan, and the result physiological state data is compared with the target physiological state data to obtain reward values of the execution plans, and selecting a target execution plan meeting the conditions according to the reward value, outputting execution actions and physiological state data contained in the target execution plan, and finishing daily diet collocation by the user according to the execution actions, thereby outputting diet data of long-term plans, improving the planning efficiency of diet plans and helping the user achieve the aim of achieving target body state data in an execution period.

An embodiment of the present application further provides a computer-readable storage medium, on which a computer program is stored, the computer program, when executed by the processor, implementing a method for dietary plan decision-making, comprising the steps of: acquiring current physiological state data and target physiological state data of a user; taking the current physiological state data and the target physiological state data as the input of a reinforcement learning sequential decision model, and outputting a plurality of execution plans and reward values of the execution plans based on the reinforcement learning sequential decision model; and selecting a target execution plan meeting preset conditions according to the reward value, and outputting the execution action and physiological state data of each day contained in the target execution plan.

The computer readable storage medium provides a decision-making method of a long-distance planned diet plan, under a diet management scene, current physiological state data and target physiological state data of a user are obtained, the current physiological state data and the target physiological state data are used as input of a reinforcement learning sequential decision model, the reinforcement learning sequential decision model can calculate physiological state data of the user in the previous day and physiological state data of the user in the next day under different execution actions, the physiological state data after the execution action of each day in a preset period is calculated, different execution actions of each day in the preset period are combined to be used as an execution plan, the physiological state data of the last day of the execution plan is used as result physiological state data of the execution plan, and then the result physiological state data is compared with the target physiological state data to obtain reward values of the execution plans, and selecting a target execution plan meeting the conditions according to the reward value, outputting execution actions and physiological state data contained in the target execution plan, and finishing daily diet collocation by the user according to the execution actions, thereby outputting diet data of long-term plans, improving the planning efficiency of diet plans and helping the user achieve the aim of achieving target body state data in an execution period.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above.

Any reference to memory, storage, database, or other medium provided herein and used in the embodiments may include non-volatile and/or volatile memory.

Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), double-rate SDRAM (SSRSDRAM), Enhanced SDRAM (ESDRAM), synchronous link (Synchlink) DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and bus dynamic RAM (RDRAM).

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, apparatus, article, or method that includes the element.

The above description is only a preferred embodiment of the present application and is not intended to limit the scope of the present application.

All the equivalent structures or equivalent processes performed by using the contents of the specification and the drawings of the present application, or directly or indirectly applied to other related technical fields, are included in the scope of protection of the present application.

Claims

1. A method for determining a dietary plan, comprising:

2. A method for dietary plan decision-making according to claim 1, wherein said reinforcement learning based sequential decision model outputs several execution plans and reward values for each execution plan, including:

3. The method of claim 2, wherein the step of obtaining a plurality of execution actions from a predetermined database, and selecting one of the execution actions in sequence as a time before the execution action of the previous day further comprises:

4. The dietary plan decision method of claim 1, wherein after selecting an execution plan meeting preset conditions as a target execution plan according to the reward value, the method further comprises:

matching target users with the same target execution plan for the users;

5. The method of claim 1, wherein the action comprises dietary information, the dietary information including a food category; the executing action of each day included in the outputting of the target execution plan comprises:

acquiring geographical features and dietary habits of a user;

6. The method of claim 1, wherein the action comprises dietary information, the dietary information including a food category; the executing action of each day included in the outputting of the target execution plan comprises:

acquiring the type of food to be replaced selected by a user;

7. The method for determining a dietary plan according to claim 1, wherein after selecting an execution plan meeting a predetermined condition as a target execution plan according to the reward value and outputting the daily execution action and physiological status data included in the target execution plan, the method further comprises:

and sending the target execution plan to a preset supervisor.

8. A dietary plan decision-making apparatus, comprising:

9. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor when executing the computer program performs the steps of a method of decision making for a dietary plan as claimed in any one of claims 1 to 7.

10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of a method of decision of a diet plan according to any one of claims 1 to 7.