WO2021095561A1

WO2021095561A1 - Information processing device, information processing method, and program

Info

Publication number: WO2021095561A1
Application number: PCT/JP2020/040771
Authority: WO
Inventors: 拓郎川合
Original assignee: ソニーグループ株式会社
Priority date: 2019-11-15
Filing date: 2020-10-30
Publication date: 2021-05-20

Abstract

The present technology relates to an information processing device, an information processing method, and a program with which it is possible to improve the learning efficiency of a user. On the basis of the learning state of a user, the contents of a change to the learning environment of the user for improving the learning state are calculated.

Description

Information processing equipment, information processing methods, and programs

This technology relates to an information processing device, an information processing method, and a program, and particularly to an information processing device, an information processing method, and a program for improving the learning efficiency of a user.

Patent Document 1 discloses a technique for changing the difficulty level of a problem presented to a user according to the learning level or the like of the user.

Patent Document 2 discloses that a stress index is measured from a user's brain wave or the like, and white noise is generated based on the stress index to relieve stress.

Patent Document 3 discloses a technique for estimating a user's emotional state from the state of an electronic pen.

International Publication No. 2016/0884663 Special Table 2017-528282 International Publication No. 2018/04306

It is a useful task to improve the learning efficiency of users, but it is difficult to realize it by simply generating white noise because there are individual differences in tastes and personalities of each user.

This technology was made in view of such a situation, and aims to improve the learning efficiency of the user.

The information processing device or program of one aspect of the present technology has a processing unit that calculates the changed contents for the learning environment of the user and the learning state is improved based on the learning state of the user. An information processing device, or a program for operating a computer as such an information processing device.

In the information processing method of one aspect of the present technology, the processing unit of the information processing device including the processing unit is a change content with respect to the learning environment of the user based on the learning state of the user, and the learning state is improved. This is an information processing method for calculating changes.

In the information processing device, the information processing method, and the program of one aspect of the present technology, the change contents for the learning environment of the user and the change contents for which the learning state is improved are calculated based on the learning state of the user. Will be done.

It is a block diagram which showed the structural example of one Embodiment of the information processing apparatus to which this technique is applied. It is a functional block diagram explaining the function of the information processing apparatus of FIG. It is a figure which illustrated the type of the sensor which the user state sensing part can use for sensing the user state, and the information (the purpose of sensing) obtained by the sensor. It is a figure which illustrated the element of the learning environment sensed by the learning environment sensing part, and the type of a sensor. It is a figure which illustrated the element of the learning environment controlled by the environment control unit, and the type of the environment control device used for controlling each element. It is a flowchart explaining the processing example performed by the information processing apparatus of FIG. It is a functional block diagram explaining the details of the processing example of the learning environment control in the learning environment control part. It is a flowchart explaining the generation process of the learning model performed by the environment data analysis part in the 1st process example. It is a flowchart explaining the process when the environment data analysis part calculates the analysis result using the trained learning model specialized for the user in the 1st process example. It is a flowchart explaining the process when the environmental data analysis part calculates the analysis result using the learning model in the 2nd process example.

Hereinafter, embodiments of the present technology will be described with reference to the drawings.

<< An embodiment of an information processing device to which this technology is applied >>
FIG. 1 is a block diagram showing a configuration example of an embodiment of an information processing device to which the present technology is applied.

In FIG. 1, the information processing device 11 includes an information processing unit 12, various sensors 13, and various environmental control devices 14.

The information processing unit 12 includes a computer, and may be, for example, a personal computer, a smartphone, a notepad, a mobile phone, or the like.

Various sensors 13 include one or more types of sensors. The various sensors 13 include sensors that sense the user's state such as the position and behavior of the user, and sensors that sense the user's learning environment such as sound and temperature. The various sensors 13 are connected to the communication unit 27 or the connection port 28 described later of the information processing unit 12 and exchange information with the information processing unit 12.

The environmental control device 14 includes one or a plurality of types of devices that change the sound, temperature, etc. of the user's learning environment. The environmental control device 14 is connected to the communication unit 27 or the connection port 28 described later of the information processing unit 12 and exchanges information with the information processing unit 12.

The information processing unit 12 includes, for example, a CPU (Central Processing Unit) 21, a ROM (Read Only Memory) 22, a RAM (Random Access Memory) 23, an input unit 24, an output unit 25, a storage unit 26, a communication unit 27, and a connection port. It has 28 and a drive 29.

The CPU 21 performs all or part of the operation of each component of the information processing unit 12 via the bus 31 and the input / output interface 32 based on various programs recorded in the ROM 22, the RAM 23, the storage unit 26, or the removable media 30. Control.

The ROM 22 stores a program read into the CPU 21, data used for calculation, and the like.

The RAM 23 temporarily stores a program read into the CPU 21 and various parameters that change as appropriate when the program is executed.

The input unit 24 is a device for a user to input information, and may be, for example, a mouse, a keyboard, a touch panel, a microphone, a button, a switch, or the like.

The output unit 25 is a device that visually or audibly notifies the user of information, and may be, for example, a display device, an audio output device such as a speaker and headphones, a printer, a facsimile, or the like.

The storage unit 26 is a device for storing various types of data, and may be, for example, a magnetic storage device such as a hard disk drive, a semiconductor storage device, an optical storage device, an optical magnetic storage device, or the like.

The communication unit 27 is a communication device for connecting to a network, and may be, for example, a wired LAN or a wireless LAN, Bluetooth (registered trademark), or the like.

The connection port 28 is a port for connecting an externally connected device, and may be, for example, a USB port, an IEEE1394 port, SCSI, an optical audio terminal, or the like.

The drive 29 is a device that reads or writes information to a removable medium 30 such as a magnetic disk, an optical disk, a magneto-optical disk, or a semiconductor memory.

The information processing unit 12 can install the program in the storage unit 26 via the input / output interface 32 by mounting the removable media 30 storing the program executed by the CPU 21 in the drive 29.

The program may be received by the communication unit 27 and installed in the storage unit 26 via a wired or wireless transmission medium, or may be installed in the ROM 22 or the storage unit 26 in advance. Further, the program executed by the CPU 21 may be a program that is processed in chronological order in the order described in this specification, or may be a program that is processed in parallel or at a necessary timing such as when a call is made. It may be a program in which processing is performed.

<Functional block diagram of information processing device 11>
FIG. 2 is a functional block diagram illustrating the function of the information processing device 11 of FIG.

In FIG. 2, the information processing device 11 has a learning content control unit 41 and a learning environment control unit 42.

The learning content control unit 41 controls to provide the user with a problem according to the degree of understanding of the user's learning.

The learning environment control unit 42 controls the environment (learning environment) of the learning space in which the user learns in order to improve the learning state such as the degree of concentration of the user on learning.

(Learning content control unit 41)
The learning content control unit 41 includes a user interface unit 61, a user state sensing unit 62, a learning data analysis unit 63, and a problem generation unit 64.

The user interface unit 61 includes the input unit 24 and the output unit 25 of FIG. 1, presents information to the user, and receives information from the user.

The user interface unit 61 presents the problem q from the problem generation unit 64 to the user by the output unit 25. Further, when the user inputs information such as an answer to the question q from the input unit 24, the user interface unit 61 supplies the answer information R including the answer and the answer time information to the learning data analysis unit 63.

The user solves the problem q from the problem generation unit 64 presented by the user interface unit 61 in the learning environment E'. Then, the user inputs the answer from the user interface unit 61.

The answer time is the time required for the user to solve the problem q, and the user may input the answer time from the user interface unit 61, or the user interface unit 61 presents the problem q to the user. The learning data analysis unit 63 may calculate based on the time until the answer is input from the user interface unit 61.

The user state sensing unit 62 includes some of the sensors 13 of the various sensors 13 shown in FIG. The user state sensing unit 62 senses the state (user state G) of the user who is learning using the user interface unit 61, and causes the learning data analysis unit 63 and the environment data analysis unit 82 of the learning environment control unit 42 to sense the state (user state G). Supply.

Note that the user state sensed by the user state sensing unit 62 with respect to the actual user state G'of the user is represented by the user state G in consideration of the measurement error of each sensor.

Further, the user state sensing unit 62 may include the arithmetic processing function of the CPU 21, and may acquire the information obtained by the CPU 21 performing predetermined signal processing on the signals directly obtained from the various sensors 13 as the user state G.

FIG. 3 is a diagram illustrating the types of sensors that the user state sensing unit 62 can use for sensing the user state and the information (purpose of sensing) obtained by the sensors.

In FIG. 3, GPS (Global Positioning System), a camera, a motion sensor, a microphone, a biometric information sensor, a depth sensor, an acceleration sensor, and an angular speed are used as types of sensors that the user state sensing unit 62 can use for user state sensing. There are sensors, etc.

GPS senses the user's position when the user carries or wears GPS. By sensing the user's position, the user's behavior such as whether the user is in the same position or moved can be grasped in addition to the place where the user is learning (home or outside, etc.). The GPS may be mounted on a mobile terminal such as a smartphone.

The camera is one or more cameras that capture the user's learning space. The user state sensing unit 62 senses the position of the user in the learning space based on the image obtained from the camera. In addition, the user's behavior can be grasped by sensing the user's position. In addition, the user's minute behavior such as facial movement can be grasped from the image obtained from the camera.

The motion sensor is installed in the learning space and senses the position of the user in the learning space using infrared rays or the like. In addition, the user's behavior can be grasped by sensing the user's position.

The microphone is installed in the learning space and senses the user's voice. By sensing the user's voice, the state of fatigue of the user can be grasped.

The biological information sensor senses a biological state such as a user's pulse, sweating, brain wave, touch, smell, or taste. By sensing the user's pulse, sweating, and brain waves, the degree of concentration of the user on learning can be grasped.

Also, sensing the user's sense of touch, smell, or taste means sensing how much the user's sense of touch, smell, or taste is working. By sensing the user's sense of touch, smell, or taste, the degree of concentration of the user on learning can be grasped.

The depth sensor senses depth information (three-dimensional position including the depth direction) in the user's learning space. By sensing the depth information, the user's three-dimensional position and behavior can be grasped.

The acceleration sensor senses the user's acceleration when the user carries or wears the acceleration sensor. By sensing the acceleration of the user, it is possible to grasp minute actions (movements) such as a change in posture that does not accompany the movement of the position in addition to the movement of the position of the user. The acceleration sensor may be mounted on a mobile terminal such as a smartphone carried by the user.

The angular velocity sensor senses the user's angular velocity when the user carries or wears the angular velocity sensor. By sensing the user's angular velocity, it is possible to grasp minute actions (movements) that change the direction of the user.

The user state sensing unit 62 does not have to have all the types of sensors shown in FIG. 3, and may have other types of sensors as long as it is a sensor that senses the user state. Good. Further, the user state to be detected may be any one or more of the user's position, behavior, orientation, pulse, sweating, brain wave, touch, smell, and taste.

In FIG. 2, the learning data analysis unit 63 is a functional block realized by the arithmetic processing of the CPU 21 of FIG. 1, and includes the answer information R from the user interface unit 61 and the user state G from the user state sensing unit 62. Based on this, the learning state (good quality for learning) such as the user's concentration on learning, comprehension (learning degree), and learning speed is analyzed. The learning data analysis unit 63 supplies the analysis result AI representing the learning state of the analyzed user to the problem generation unit 64.

The degree of concentration of the user on learning means an index of the user's concentration. The learning data analysis unit 63 can obtain the degree of concentration based on the user state G from the user state sensing unit 62, for example, and may particularly obtain it from the user's ecological information. Further, the learning data analysis unit 63 may obtain the degree of concentration from the time since the user starts learning, the time, the correct answer rate for the problem, the transition of the time required for the answer, and the like.

The user's understanding of learning means an index of the user's understanding of a predetermined learning area. The learning area refers to the range of learning consisting of learning units such as grades, subjects, and units. The learning data analysis unit 63 obtains the degree of understanding based on the correct answer rate of the question, the time required for the answer, and the questionnaire result for the question.

The learning speed for the user's learning means an index of the speed of understanding the problem once learned by the user. Whether or not the learning data analysis unit 63 was able to solve the wrong problem once (if it was possible to solve it, the time required for the answer), and whether or not it was able to solve a similar problem related to the correct answer (if it was possible, the time required for the answer). ), Find the learning speed based on the correct answer rate when answering the wrong question again and the time required to answer.

Further, the learning data analysis unit 63 supplies a part or all of the analyzed learning state (analysis result AI) as learning information C to the environment data analysis unit 82 of the learning environment control unit 42. In FIG. 2, the learning data analysis unit 63 supplies the analyzed user's concentration, comprehension, and concentration and comprehension of the learning speed to the environment data analysis unit 82 as learning information C.

However, the learning data analysis unit 63 supplies all the information of the degree of concentration, the degree of understanding, and the acquisition speed, or any one or two pieces of information as learning information C to the environmental data analysis unit 82. May be good.

Further, if the learning information C supplied from the learning data analysis unit 63 to the environment data analysis unit 82 is information representing the user's current learning state (good quality for learning), the degree of concentration, the degree of understanding, and the degree of understanding, and Information other than the learning speed may be used.

The problem generation unit 64 generates a problem according to the learning state of the user, for example, a problem having a difficulty level according to the learning state of the user, based on the analysis result AI from the learning data analysis unit 63, and the user interface unit 61. Supply to.

Here, for example, the technique described in Patent Document 1 (International Publication No. 2016/0884663) may be applied to the learning content control unit 41.

However, the technique of Patent Document 1 does not have the learning environment control unit 42 of FIG. Therefore, the technique of Patent Document 1 does not improve the learning efficiency by controlling the learning environment as in the information processing device 11 to which the present technique is applied.

Also, controlling the equipment in the room so that the environment is comfortable for the user does not necessarily improve the learning efficiency. On the other hand, in the information processing device 11 to which the present technology is applied, the learning environment control unit 42 acquires the learning information C including the learning state of the user from the learning content control unit 41, and the temperature of the learning environment is improved so that the learning state is improved. It is possible to improve the learning efficiency appropriately because such changes are made.

(Learning environment control unit 42)
In FIG. 2, the learning environment control unit 42 includes a learning environment sensing unit 81, an environmental data analysis unit 82, and an environment control unit 83.

The learning environment sensing unit 81 includes some of the sensors 13 of the various sensors 13 shown in FIG. The learning environment sensing unit 81 senses the learning environment of the user and supplies the environment information E representing the current state of the learning environment to the environment data analysis unit 82.

The learning environment sensing unit 81 may include the arithmetic processing function of the CPU 21 and acquire the information obtained by the CPU 21 performing predetermined signal processing on the signals directly obtained from the various sensors 13 as the environment information E.

FIG. 4 is a diagram illustrating the elements of the learning environment sensed by the learning environment sensing unit 81 and the types of sensors.

In FIG. 4, as elements of the learning environment sensed by the learning environment sensing unit 81, sound, image, illuminance, temperature, humidity, atmospheric pressure, open / closed state of windows and doors, clutter of the room, presence / absence of others, weather, and , Time (learning time, time) is shown.

In sound sensing, for example, the loudness of sound such as noise in the learning space is detected by a microphone installed in the learning space.

In video sensing, for example, information on whether or not video is being displayed is acquired from a video display device (display, etc.) or from a video output device that supplies video to the video display device, and is deployed in the learning space. It is detected whether or not the image is displayed on the displayed image display device.

In illuminance sensing, for example, the height of illuminance in the learning space is detected by an illuminance sensor installed in the learning space. Further, instead of using the illuminance sensor, the height of the illuminance in the learning space may be detected by acquiring the information of the set value of the illuminance in the lighting device that illuminates the learning space. Further, the high illuminance of the learning space may be detected by analyzing the image from the camera that captures the learning space.

In temperature sensing, the high temperature of the learning space is detected by the temperature sensor built into the air conditioning equipment in the learning space or the temperature sensor installed in the learning space separately from the air conditioning equipment.

In humidity sensing, the high humidity of the learning space is detected by the humidity sensor built into the air conditioning equipment in the learning space or the humidity sensor installed in the learning space separately from the air conditioning equipment.

In atmospheric pressure sensing, the height of atmospheric pressure in the learning space is detected by the atmospheric pressure sensor built into the air conditioning equipment in the learning space or the atmospheric pressure sensor installed in the learning space separately from the air conditioning equipment.

In the sensing of the open / closed state of the window or door, the open / closed state of the window or door that shields the learning space is detected based on the image from the camera that captures the learning space or by the open / close sensor installed on the window or door. The door.

In the sensing of the clutter of the room, the clutter of the room is detected based on the image from the camera that captures the space of the learning environment.

In the sensing of the presence or absence of others, the presence or absence of others is detected by analyzing whether or not there are multiple persons in the learning space based on the image from the camera that captures the learning space. The presence or absence of others may be detected by a motion sensor or a depth sensor installed in the learning space.

In weather sensing, it is detected whether the weather is sunny, cloudy, or rainy based on the information from the humidity sensor, temperature sensor, and illuminance sensor. The weather information may be obtained from an internet site or the like.

In time sensing, time information is acquired from the clock function built in the information processing unit 12 or a specific server on the Internet, and the learning time (elapsed time from the start of learning) and the current time are detected.

The learning environment element sensed by the learning environment sensing unit 81 may be any one or a plurality of elements of the learning environment shown in FIG.

In FIG. 2, the environment data analysis unit 82 is a functional block realized by the arithmetic processing of the CPU 21 of FIG. 1, and includes the environment information E from the learning environment sensing unit 81, the user state G from the user state sensing unit 62, and the user state G. Based on the learning information C from the learning data analysis unit 63, the influence of the learning environment on the learning state of the user is analyzed.

Then, the environment data analysis unit 82 calculates the change contents of the learning environment based on the analysis result, in which the learning state of the user is improved from the present. The environment data analysis unit 82 supplies the control content (change content of the learning environment) for the learning environment for changing to the learning environment in which the learning state of the user is improved to the environment control unit 83 as the analysis result Ae.

The environmental control unit 83 includes the arithmetic processing function of the CPU 21 shown in FIG. 1 and various environmental control devices 14. The environment control unit 83 controls the various environment control devices 14 shown in FIG. 2 based on the analysis result Ae from the environment data analysis unit 82, and controls the learning environment E'.

FIG. 5 is a diagram illustrating the elements of the learning environment controlled by the environment control unit 83 and the types of environment control devices used to control each element.

FIG. 5 illustrates noise, music, illuminance, temperature, humidity, information presentation (video), communication, and room clutter as elements of the learning environment controlled by the environment control unit 83. Noise and music are both elements related to the sound of the learning environment, but they are separate elements in the control of the learning environment.

In noise control, whether or not noise cancellation is performed is controlled by an audio device connected to a speaker installed in the learning space or a headphone worn by the user. When canceling noise, the audio device supplies a sound having a phase opposite to the noise to the speaker or headphones.

In music control, whether or not to output music and the volume of music are controlled by the speakers installed in the learning space and the audio equipment connected to the headphones worn by the user. In music control, music selection may be controlled based on the genre of music or the like.

In illuminance control, the height of illuminance in the learning space is controlled by the lighting equipment installed in the learning space.

In temperature control, the temperature of the learning space is controlled by the air conditioner.

In humidity control, the humidity level of the learning space is controlled by the air conditioner.

In the control of information presentation (video), whether or not to display the video is controlled by the video display device (device that outputs the video to the display) installed in the learning space. The video may be information for deepening the user's understanding, or may be information for increasing the concentration.

In the control of contact (intervention of others), whether or not to block the contact from others by e-mail or telephone to the user by the communication terminal, or to register in advance so as not to contact the user. Whether or not to announce to others who have been made is controlled. Further, in the control of communication, whether the door key is locked or unlocked is controlled by the device that controls the door key at the entrance to the learning space.

In the control of how cluttered the room is, whether or not to clean or tidy up the learning space is controlled by a robot that cleans or a robot that cleans up.

<Processing example of information processing device 11>
FIG. 6 is a flowchart illustrating a processing example performed by the information processing apparatus 11 of FIG.

In FIG. 6, steps S11 to S14 show a processing example of learning content control performed by the learning content control unit 41, and steps S15 to S17 show a processing example of learning environment control performed by the learning environment control unit 42.

In step S11, when the user solves the question q in the learning environment E'and inputs the answer (answer), the user interface unit 61 accepts the user's answer. Then, the user interface unit 61 supplies the answer information R including the user's answer and the answer time to the learning data analysis unit 63. The process proceeds from step S11 to step S12.

In step S12, the user state sensing unit 62 senses the user state G'of the learning user and acquires the user state G. Then, the user state sensing unit 62 supplies the acquired user state G to the learning data analysis unit 63 of the learning content control unit 41 and the environment data analysis unit 82 of the learning environment control unit 42. The process proceeds from step S12 to step S13.

In step S13, the learning data analysis unit 63 uses the answer information R supplied from the user interface unit 61 in step S11 and the user state G supplied from the user state sensing unit 62 in step S12 to concentrate the users. The learning state such as degree, comprehension degree (learning degree), and learning speed is analyzed, and the analysis result AI representing the learning state of the user is calculated. The learning data analysis unit 63 supplies the calculated analysis result AI to the problem generation unit 64.

Further, the learning data analysis unit 63 supplies, for example, the degree of concentration and the degree of understanding as learning information C to the environment data analysis unit 82 among the learning states obtained by the analysis. The process proceeds from step S13 to step S14.

In step S14, the problem generation unit 64 generates the problem q according to the learning state of the user based on the analysis result AI supplied from the learning data analysis unit 63 in step S13. The problem q generated by the problem generation unit 64 is supplied to the output unit 25 (see FIG. 1) in the user interface unit 61. The process returns to step S11 after step S14, and repeats steps S11 to S14.

In step S15, the learning environment sensing unit 81 of the learning environment control unit 42 senses the environment (learning environment) of the learning space in which the user is learning, and acquires the environment information E. The learning environment sensing unit 81 supplies the acquired environmental information E to the environmental data analysis unit 82. The process proceeds from step S15 to step S16.

In step S16, the environment data analysis unit 82 includes the user state G supplied from the user state sensing unit 62 in step S12, the learning information C supplied from the learning data analysis unit 63 in step S13, and the learning environment in step S15. Using the environmental information E supplied from the sensing unit 81, it is analyzed whether or not the degree of concentration this time is improved as compared with the degree of concentration indicated by the learning information C supplied from the learning data analysis unit 63 last time.

Then, the learning data analysis unit 63 calculates the next change content (control content) for the learning environment as the analysis result Ae, which improves the degree of concentration. The learning data analysis unit 63 analyzes whether the learning state is improved in consideration of not only the improvement of the degree of concentration but also the degree of understanding of the user, and analyzes the changed contents of the learning environment in which the learning state is improved. It may be calculated as. The process proceeds from step S16 to step S17.

In step S17, the environment control unit 83 controls the learning environment so that the learning environment E'is suitable for learning for the user, based on the analysis result Ae supplied from the environment data analysis unit 82 in step S16. The process returns to step S15 after step S17, and repeats steps S15 to S17.

(effect)
According to the above information processing device 11, the learning environment is changed to the user because the learning environment control unit 42 changes the learning environment so that the learning state of the user is improved based on the learning information C from the learning content control unit 41. The learning environment will be appropriately changed to be suitable for learning, and the learning efficiency will be improved appropriately.

<Details of learning environment control>
FIG. 7 is a functional block diagram illustrating details of a processing example of learning environment control in the learning environment control unit 42.

FIG. 7 shows the environmental data analysis unit 82 and the environmental control unit 83 of FIG. 2, and the environmental control database 101 (not shown) of FIG. The environment control database 101 is a functional block realized by the storage unit 26 of FIG. 1, and stores data and the like referred to by the environment data analysis unit 82.

Further, FIG. 7 shows a configuration example of the environment control unit 83, and the environment control unit 83 includes a control signal generation unit 91 and devices 92A to 92N.

The control signal generation unit 91 is a functional block realized by the arithmetic processing of the CPU 21 of FIG. The control signal generation unit 91 generates a control signal for controlling each of the devices 92A to 92N based on the analysis result Ae from the environment data analysis unit 82. The analysis result Ae from the environment data analysis unit 82 includes, for example, information on the change contents (control contents) for each element of the learning environment shown in FIG.

Further, each of the devices 92A to 92N is an environmental control device used for controlling each element of the learning environment. The devices 92A to 92N are associated with elements of the learning environment that can be changed by each.

The control signal generation unit 91 is associated with each element of the learning environment 92A so that each element of the learning environment is changed according to the control content indicated by the analysis result Ae from the environment data analysis unit 82. To generate a control signal for the device 92N. Then, the control signal generation unit 91 supplies the generated control signal to each device 92A to 92N to control each device 92A to 92N.

The control of some of the environmental control devices among the devices 92A to 92N exemplified in FIG. 7 will be described.

The device 92A is an environment control device used for controlling noise in the learning environment shown in FIG. Further, the device 92A is, for example, an audio device that cancels noise by outputting a sound having a phase opposite to that of noise to a speaker installed in a learning space or a headphone worn by a user. The control signal generation unit 91 controls the device 92A according to the control content included in the analysis result Ae and whether or not noise cancellation is performed.

The device 92B is an environmental control device used for controlling music in the learning environment shown in FIG. The device 92B is, for example, an audio device that outputs music to a speaker installed in a learning space or a headphone worn by a user. The control signal generation unit 91 controls the device 29B according to the control content included in the analysis result Ae and whether or not to output music.

The analysis result Ae may include a control content for raising or lowering the volume of the music. In that case, the control signal generation unit 91 controls the device 92B to raise or lower the volume of the music by a certain amount. Lower.

Further, the analysis result Ae may include a control content for changing a song (music genre, etc.). In that case, the control signal generation unit 91 controls the device 92B to change the song.

The device 92C is an environmental control device used for controlling the illuminance of the learning environment shown in FIG. The device 92C is, for example, a lighting device installed in a learning space. The control signal generation unit 91 controls the device 29C according to the control content included in the analysis result Ae to increase, decrease, or maintain the height of the illuminance, and the high illuminance of the learning environment. The height is increased, decreased, or maintained by a certain amount.

The device 92D is an environmental control device used for controlling the temperature of the learning environment shown in FIG. 5, and is, for example, an air-conditioning device installed in the learning space. The control signal generation unit 91 controls the device 29D according to the control content included in the analysis result Ae to raise, lower, or maintain the temperature height, and the temperature of the learning environment is high. The temperature is increased, decreased, or maintained by a certain amount.

The device 92E is an environmental control device used for controlling the information presentation (video) shown in FIG. 5, and is, for example, a video display device installed in a learning space. The control signal generation unit 91 controls the device 29E according to the control content included in the analysis result Ae and whether or not to display the image, and controls whether or not to display the image in the learning space. .. The content of the information (video) to be displayed includes, for example, information for deepening the user's understanding.

The device 92N is an environmental control device used for controlling factors that cause the user to lose concentration, and is, for example, a communication terminal used for controlling communication shown in FIG. The control signal generation unit 91 controls the device 92N according to the control content included in the analysis result Ae and whether or not to cut off the communication.

For example, the control signal generation unit 91 is registered in advance by the device 92N whether or not to block the contact from others by e-mail, telephone, etc. to the user, or not to contact the user. Controls whether or not to announce to others. In addition, there is the entry and exit of others into the learning space as a factor that discourages the user's concentration, and as an example of the device 92N, a device (electric key) that controls the key of the entrance door to the learning space may be used. In this case, the control signal generation unit 91 controls the device 92N according to the control content included in the analysis result Ae and whether or not to block the communication, and controls whether or not to lock the door. ..

The devices 92A to 92N shown in FIG. 7 are examples, and the environmental control unit 83 may have any one or more of the devices 92A to 92N, or the device 92A. Or may have a device other than the device 92N.

Further, although omitted in FIG. 7, the environmental control unit 83 has a device for controlling the humidity of the learning environment (for example, an air conditioner) and a device for controlling the degree of clutter in the room (robot, etc.) as the environmental control device. You may be.

In this case, the control signal generation unit 91 is a device that controls the temperature of the learning environment according to the control content included in the analysis result Ae, which is to increase, decrease, or maintain the height of the humidity. To raise, lower, or maintain a certain amount of temperature in the learning environment.

Further, the control signal generation unit 91 controls the device that controls the clutter of the room according to the control content included in the analysis result Ae to reduce the clutter of the room, and cleans the room. Controls whether or not to keep things tidy.

Further, the environment control unit 83 may control the scent of the learning environment by, for example, an aroma diffuser according to the analysis result Ae, or may control the lock of the TV, the game, or the smartphone to reduce the temptation. ..

Further, the environmental control unit 83 may set the fixed telephone to the answering machine mode (answering machine) according to the analysis result Ae, or may prevent the intercom from being turned on. Further, the environment control unit 83 may operate the espresso machine according to the analysis result Ae to urge the user to take a break, or may control the humanoid robot or the animal robot to support the user. Good.

<Details of processing by the environmental data analysis unit 82>
The environment data analysis unit 82 uses the environment information E from the learning environment sensing unit 81, the user state G from the user state sensing unit 62, and the learning information C from the learning data analysis unit 63 as input data, and each of the environments. The analysis result Ae indicating the control content (change content) of the element is calculated as output data and supplied to the control signal generation unit 91.

As shown in FIG. 4, the environmental information E includes sound, video, illuminance, temperature, humidity, atmospheric pressure, open / closed state of windows and doors, clutter of the room, presence / absence of others other than the user, and weather as shown in FIG. , And there is information about time etc. Any one or more of these pieces of information are given to the environmental data analysis unit 82 as environmental information E.

As the user state G, as shown in FIG. 3, there are states related to the user's position, behavior, orientation, pulse, sweating, brain wave, touch, smell, and taste. Information on one or more of these states is given to the environment data analysis unit 82 as the user state G.

The learning information C includes information on the learning state such as the user's concentration level, comprehension level, and learning speed for learning. Among the information on the learning state such as the degree of concentration, the degree of understanding, and the learning speed, for example, the information on the degree of concentration and the degree of understanding is given to the environmental data analysis unit 82 as the learning information C. However, there may be a case where any one or more information on the learning state such as the degree of concentration, the degree of understanding, and the learning speed of the user is given to the environmental data analysis unit 82 as the learning information C. ..

The environmental data analysis unit 82 compares each of the environmental information E, the user state G, and the learning information C with the past values. For example, it is calculated how much the user state G and the learning information C have changed with respect to the amount of change in the environment information E, and which element of the learning environment has influenced the change in the user state G and the learning information C. Perform an analysis.

Then, the environment data analysis unit 82 calculates the change contents (change contents for each element of the learning environment) for the learning environment for improving (increasing) the learning state such as the concentration degree and the comprehension degree of the user as a result of the analysis. As an analysis result Ae, it is supplied to the control signal generation unit 91.

Further, when the learning state of the user indicated by the learning information C deteriorates as a result of changing the learning environment, the environment data analysis unit 82 becomes an environmental control device that controls the elements of the learning environment that have a large influence on the learning state. On the other hand, the control policy (control content) different from the previous one may be changed. For example, when the learning state of the user deteriorates as a result of outputting music to the device 92B, the environmental data analysis unit 82 does not stop the output of the music to the device 92B, but outputs the music. You may change the genre.

When the learning state of the user indicated by the learning information C is improved as a result of changing the learning environment, the environment data analysis unit 82 maintains the state of the current learning environment or each element of the current learning environment. Set a control policy to change within a predetermined range. The environment data analysis unit 82 outputs the set control policy as the analysis result Ae to the control signal generation unit 91, aiming to further improve the learning state of the user.

Further, the environment data analysis unit 82 stores the control policy for the environment information E, the user state G, and the learning information C when the user is learning, that is, the analysis result Ae in the environment control database 101. Create a database.

By referring to the analysis results Ae up to the previous time stored in the environment control database 101, the environment data analysis unit 82 promptly learns the user regardless of the pattern of the environment information E and the user state G. It becomes possible to transition the state to a good state.

Further, the environmental data analysis unit 82 may use any one of the concentration level, the comprehension level, the learning speed, and the like representing the learning state as an evaluation value representing the learning state of the user, or an element representing the learning state. The weighted average of (concentration ratio, comprehension ratio, learning speed, etc.) may be used as the evaluation value.

By calculating the evaluation value based on the learning information C supplied from the learning data analysis unit 63, the environmental data analysis unit 82 can determine that the larger the evaluation value, the better the learning state.

The environment data analysis unit 82 may change the state for each element of the learning environment that can be changed by the environment control unit 83 to set the state in which the evaluation value is maximized.

In addition, the environment data analysis unit 82 uses deep learning or other machine learning to learn the calculation process of the change content (analysis result Ae, which is the control policy) of the learning environment. ) Learning model may be used. The learning method in machine learning may be supervised learning or reinforcement learning. In the following, the process for calculating the analysis result Ae using the learning model will be described.

(First processing example of the environmental data analysis unit 82)
As a first processing example of the environmental data analysis unit 82, a case where a learning model (DNN) trained (trained) by supervised learning using machine learning (deep learning) is used will be described.

The environment data analysis unit 82 inputs the environment information E supplied from the learning environment sensing unit 81, the user state G supplied from the user state sensing unit 62, and the learning information C supplied from the learning data analysis unit 63. When input as data, the analysis result Ae is calculated using a learning model that outputs output data indicating an appropriate control policy for the learning environment (changes in the learning environment that improve the learning state of the user).

Here, the control policy represents the change content (control content) for all the elements of the learning environment to be changed (controlled), and there is a control policy for the number of all combinations of the control content that can be adopted for each element. To do.

For example, it is assumed that there are temperature and humidity as elements of the learning environment to be controlled, and there are three types of temperature and humidity control contents, rising, falling, and maintaining, respectively. In this case, as a combination of the control contents of the temperature and the humidity, (increase the temperature and increase the humidity), (increase the temperature and decrease the humidity), (increase the temperature and maintain the humidity). ), ..., (Maintaining temperature and maintaining humidity), there are nine combinations. Regarding the control contents of temperature and humidity, each of these nine combinations has one control policy, and there are a total of nine control policies.

In addition, the learning model has output nodes corresponding to each of all control policies, and each output node outputs, for example, a value in the range of 0 to 1. Then, the output value from the output node corresponding to each control policy represents the appropriateness of each control policy.

The environmental data analysis unit 82 determines the control policy that maximizes the aptitude output from the output node of the learning model as the control policy that improves the learning state (for example, the degree of concentration) of the user. Further, the environmental data analysis unit 82 supplies the determined control policy as the analysis result Ae to the control signal generation unit 91 of the environmental control unit 83.

The input data to the learning model may be any one or more of the environment information E, the user state G, and the learning information C.

The environment data analysis unit 82 is a learned learning model (learning for learning data collection) stored in advance in the environment control database 101 until a user-specific learning model is generated by machine learning (deep learning). The training data is collected using the model).

The learning model for collecting learning data may be a learning model unrelated to the user (for example, an unlearned learning model), or a learning model corresponding to the user's tendency toward learning.

When using a learning model corresponding to the tendency of the user to learn, the environment data analysis unit 82 inputs to the user a condition for improving the learning state (concentration ratio, etc.) for each element of the learning environment to be controlled, for example. It is input from Fig. 1) and acquired as user information. The conditions for improving the learning state for each element of the controlled environment are, for example, whether or not the person with noise can concentrate on learning, whether or not the person who is playing music can concentrate on learning, and what is the temperature. It is a condition for the user to judge that the learning state is improved for each element of the learning environment, such as whether the degree can concentrate on learning.

On the other hand, the environment control database 101 stores a learning model that calculates a substantially appropriate control policy for each of the same or similar user information. For example, in the environment control database 101, a plurality of learning models generated for other users and user information of the user are stored in association with each other. The environment data analysis unit 82 reads a learning model corresponding to the same or similar user information as the user information acquired from the user from the environment control database 101 and uses it as a learning model for collecting learning data.

Further, the environmental data analysis unit 82 evaluates the goodness of the learning state of the user when collecting the learning data using the learning model for collecting the learning data.

In order to perform the evaluation, the environmental data analysis unit 82 calculates the evaluation value Z based on the learning information C from the learning data analysis unit 63.

The environmental data analysis unit 82 includes a plurality of information (elements), for example, when the learning information C includes the concentration level and the comprehension level, or when the learning information C includes information such as the acquisition speed in addition to the concentration level and the comprehension level. In the case, the weighted average of those plurality of elements is set as the evaluation value Z. The evaluation value Z may be a weighted average when the weights other than one element are set to 0 among the plurality of elements of the learning information C. In this case, the evaluation value Z is the learning information C. It is the value of any one element.

The evaluation value Z indicates a higher value as the user's learning state is better (the higher the concentration and understanding), and the larger the increase in the evaluation value Z due to the control of the learning environment based on the analysis result Ae, the higher the value. It shows that the environmental control based on the analysis result Ae was an appropriate control content for the user to improve the learning state.

Each time the environment data analysis unit 82 supplies the analysis result Ae to the environment control unit 83 and changes (changes) the learning environment, the environment information E from the learning environment sensing unit 81 and the user state sensing unit 62 The user state G and the learning information C from the learning data analysis unit 63 are acquired.

Further, each time the learning environment is changed, the environment data analysis unit 82 calculates the next control policy using the learning model based on the acquisition of the newly acquired environment information E, the user state G, and the learning information C. Then, it is supplied to the environment control unit 83 as the analysis result Ae.

Further, the environment data analysis unit 82 calculates the evaluation value Z based on the newly acquired learning information C every time the learning environment is changed.

Here, it is assumed that the time t is represented by the number of time steps, with the time from the change of the learning environment at a certain time by the analysis result Ae to the next change of the learning environment as one hour step. It is assumed that the one-hour step is a time longer than the time from when the environment control unit 83 changes the learning environment based on the analysis result Ae until the change in the learning environment appears as an effect on the learning state of the user.

Further, at a certain time t (within the time of the one-hour step corresponding to the time t), the environmental information E, the user state G, and the learning information C acquired by the environmental data analysis unit 82 are E [t] and G, respectively. It is represented by [t] and C [t], and the evaluation value Z calculated based on the learning information C [t] is represented by Z [t]. Further, the analysis result Ae calculated based on the environmental information E [t], the user state G [t], and the learning information C [t] is represented by Ae [t].

At this time, the environmental data analysis unit 82 obtains an increase amount ΔZ [t + 1] of the evaluation value Z [t + 1] at the time t + 1 with respect to the evaluation value Z [t] at the time t, and the increase amount ΔZ [t + 1]. When is equal to or greater than a predetermined threshold value ΔZs, it is determined that the control of the learning environment by the analysis result Ae [t] at the time t is an appropriate control content.

When the amount of increase ΔZ [t + 1] of the evaluation value Z [t + 1] with respect to the evaluation value Z [t] is less than the predetermined threshold value ΔZs, the control of the learning environment by the analysis result Ae [t] is appropriate control content. It is determined that it was not.

Then, the environment data analysis unit 82 determines that the control of the learning environment by the analysis result Ae [t] has appropriate control contents, the environment information E [t], the user state G [t], and the learning information. C [t] is stored in the environment control database 101 as input data in the training data.

Further, the environmental data analysis unit 82 uses the learning model when the environmental information E [t], the user state G [t], and the learning information C [t] are input to the learning model for collecting the learning data as input data. The output data of is stored in the environment control database 101 as teacher data in the training data.

The teacher data is output data from the learning model when the environment information E [t], the user state G [t], and the learning information C [t] are input to the learning model for collecting the learning data as input data. However, the output data may be adjusted. For example, for the output data from the learning model, the output value from the output node of the learning model corresponding to the control policy set as the analysis result Ae [t] is adjusted to a value close to 1, and from other output nodes. The data obtained by adjusting the output value of the above to a value close to 0 may be used as the training data.

The environmental data analysis unit 82 acquires the environmental information E, the user state G, and the learning information C, calculates the control policy (analysis result Ae) based on the environmental information E, the user state G, and the learning information C, and By repeating the calculation of the evaluation value Z, the learning data is collected and stored in the environment control database 101.

When the user finishes learning, the environmental data analysis unit 82 uses the learning data stored in the environment control database 101 when a predetermined number or more of the learning data is stored in the environment control database 101. Learn the learning model for learning.

The learning model for learning may be a learning model for collecting learning data, or may be a learning model with initial values of parameters (weights and biases). Further, any of batch learning, mini-batch learning, and online learning may be adopted as a learning method of a learning model for learning using learning data.

Then, the environment data analysis unit 82 stores the learned learning model generated by training the learning model for learning in the environment control database 101 as a learning model specialized for the user.

The environmental data analysis unit 82 reads the learning model specialized for the user from the environment control database 101 from the next learning of the user, and uses it for calculating the control policy (analysis result Ae).

Even when the environment data analysis unit 82 uses the learning model specialized for the user, the environment data analysis unit 82 collects the learning data in the same manner as when the learning model for collecting the learning data is used. The number of accumulated learning data in the control database 101 may be increased.

In this case, the environmental data analysis unit 82 may perform learning of the learning model using all the learning data accumulated in the environment control database 101, or the increase from the previous learning of the learning model. The learning model trained by the training data up to the previous time may be further trained using the training data.

FIG. 8 is a flowchart illustrating the learning model generation process performed by the environmental data analysis unit 82 in the first processing example.

In step S31, when the user first learns using the information processing device 11 (when a learning model specialized for the user is not generated), the environment data analysis unit 82 learns from the environment control database 101. Read the learning model for data collection. The process proceeds from step S31 to step S32.

In step S32, the environmental data analysis unit 82 sets the time t to 0. The process proceeds from step S32 to step S33.

In step S33, the environment data analysis unit 82 acquires the environment information E [t] from the learning environment sensing unit 81, acquires the user state G [t] from the user state sensing unit 62, and learns from the learning data analysis unit 63. Acquire information C [t]. The process proceeds from step S33 to step S34.

In step S34, the environment data analysis unit 82 calculates an evaluation value Z [t] representing the goodness of the learning state of the user based on the learning information C [t] acquired in step S33. The process proceeds from step S34 to step S35.

In step S35, the environment data analysis unit 82 uses the environment information E [t], the user state G [t], and the learning information C [t] acquired in step S33 as input data, and the learning data read in step S31. Using the learning model for collection, the analysis result Ae [t], which is the control policy for the learning environment, is calculated. The process proceeds from step S35 to step S36.

In step S36, the environment data analysis unit 82 supplies the analysis result Ae [t] calculated in step S35 to the control signal generation unit 91 of the environment control unit 83, and changes the learning environment according to the analysis result Ae [t]. .. The process proceeds from step S36 to step S37.

In step S37, the amount of increase ΔZ [t] (= Z [t] −Z) of the evaluation value Z [t] calculated in step S34 with respect to the evaluation value Z [t-1] calculated in step S34 one hour before. It is determined whether or not [t-1]) is equal to or greater than a predetermined threshold value ΔZs. When the time t is 0, the environmental data analysis unit 82 skips step S37 and step S38 and proceeds to step S39.

If it is determined in step S37 that the increase amount ΔZ [t] of the evaluation value is not equal to or greater than the predetermined threshold value ΔZs, the process skips step S38 and proceeds to step S39.

If it is determined in step S37 that the increase amount ΔZ [t] of the evaluation value is equal to or greater than the predetermined threshold ΔZs, the process proceeds to step S38, and the environmental data analysis unit 82 performs the environmental information E [1 hour before the step. t-1], the user state G [t-1], and the learning information C [t-1] are stored in the environment control database 101 as learning data. Further, the output of the learning model when the environmental information E [t-1], the user state G [t-1], and the learning information C [t-1] are input as input data to the learning model for collecting training data. The data or the adjusted data of the output data is stored in the environment control database 101 as learning data (teacher data). The process proceeds from step S38 to step S39.

In step S39, the environment data analysis unit 82 determines whether or not the user's learning has been completed.

If it is determined in step S39 that the user's learning has not been completed, the process proceeds from step S39 to step S40.

In step S40, the environmental data analysis unit 82 waits until the time for one hour step elapses from the time when the process of step S33 is started. Also, the time t represented by the number of time steps is incremented by 1. The process returns from step S40 to step S33, and steps S33 to S40 are repeated.

On the other hand, if it is determined in step S39 that the user's learning has been completed, the process proceeds from step S39 to step S41.

In step S41, the environmental data analysis unit 82 trains the learning model for learning using the learning data stored in the environment control database 101, generates a learning model specialized for the user, and generates the learning model specialized for the user, and the environment control database. Store in 101.

If the number of learning data stored in the environment control database 101 is less than a predetermined number determined in advance, the environment data analysis unit 82 will use the learning model for learning data collection even at the next learning of the user. Is used to perform the processes of steps S31 to S40 to collect learning data. Then, when the number of learning data stored in the environment control database 101 exceeds a predetermined number determined in advance, the environment data analysis unit 82 performs the process in step S41 to specialize in the user. Generate a learning model.

When the process of step S41 is completed, the process of this flowchart is completed.

<Processing when calculating the analysis result Ae using the trained learning model specialized for the user in the first processing example>
FIG. 9 is a flowchart illustrating a process when the environment data analysis unit 82 calculates the analysis result Ae using the learned learning model specialized for the user in the first processing example.

In step S51, when a user-specific learning model is generated, the environment data analysis unit 82 reads the user-specific learned learning model from the environment control database 101. The process proceeds from step S51 to step S52.

In step S52, the environmental data analysis unit 82 sets the time t to 0. The process proceeds from step S52 to step S53.

In step S53, the environment data analysis unit 82 acquires the environment information E [t] from the learning environment sensing unit 81, acquires the user state G [t] from the user state sensing unit 62, and learns from the learning data analysis unit 63. Acquire information C [t]. The process proceeds from step S53 to step S54.

In step S54, the environment data analysis unit 82 takes the environment information E [t], the user state G [t], and the learning information C [t] acquired in step S53 as input data, and reads the learning model in step S51. Is used to calculate the analysis result Ae [t], which is the control policy for the learning environment. The process proceeds from step S54 to step S55.

In step S55, the environment data analysis unit 82 supplies the analysis result Ae [t] calculated in step S54 to the control signal generation unit 91 of the environment control unit 83, and changes the learning environment according to the analysis result Ae [t]. .. The process proceeds from step S55 to step S56.

In step S56, the environment data analysis unit 82 determines whether or not the user's learning has been completed.

If it is determined in step S56 that the user's learning has not been completed, the process proceeds to step S57.

In step S57, the environmental data analysis unit 82 waits until the time for one hour step elapses from the time when the process of step S53 is started. Also, the time t represented by the number of time steps is incremented by 1. The process returns from step S57 to step S53, and steps S53 to S57 are repeated.

On the other hand, if it is determined in step S56 that the user's learning has been completed, the process skips step S57 and the process of this flowchart ends.

According to the first processing example of the environmental data analysis unit 82 described above, the environmental data analysis unit 82 supplies the environmental information E supplied from the learning environment sensing unit 81 and the user state G supplied from the user state sensing unit 62. And the learning information C supplied from the learning data analysis unit 63, a control policy for improving the learning state of the user is calculated using the learning model, without the user's trouble, and the user's preference and personality, etc. Appropriate control of the learning environment will be performed in consideration of.

(Second processing example of the environmental data analysis unit 82)
As a second processing example of the environmental data analysis unit 82, a case where a learning model (DNN) trained by reinforcement learning using machine learning (deep learning) is used will be described.

The environment data analysis unit 82 inputs the environment information E supplied from the learning environment sensing unit 81, the user state G supplied from the user state sensing unit 62, and the learning information C supplied from the learning data analysis unit 63. When input as data, the analysis result Ae is calculated using the DNN that outputs the value Q of all the control policies a as output data as a learning model. The input data to the learning model may be any one or more of the environment information E, the user state G, and the learning information C.

Since the control policy a has the same meaning as the control policy explained in the first processing example of the environmental data analysis unit 82, the description thereof is omitted here.

Further, the learning model is gradually updated to a learning model specialized for the user by reinforcement learning, but the initial learning model used first by the environmental data analysis unit 82 is a learning model unrelated to the user (for example, not yet). It may be a learning model of learning), or it may be a learning model corresponding to a user's tendency toward learning. Since the initial learning model is the same as the learning model for collecting learning data in the case of the first processing example of the environmental data analysis unit 82, the description thereof will be omitted.

The value Q of the control policy a represents the goodness of the control policy a, and is calculated based on the reward V when the learning environment is changed according to the control policy a.

The reward V represents the degree of goodness of the learning state of the user as in the evaluation value Z in the case of the first processing example of the environmental data analysis unit 82. The environmental data analysis unit 82 calculates the reward V based on the learning information C from the learning data analysis unit 63.

The environmental data analysis unit 82 includes a plurality of information (elements), for example, when the learning information C includes the concentration level and the comprehension level, or when the learning information C includes information such as the acquisition speed in addition to the concentration level and the comprehension level. In the case, the weighted average of those plurality of elements is defined as the reward V. The reward V may be a weighted average when the weights other than one element are set to 0 among the plurality of elements of the learning information C. In this case, the reward V is any one of the learning information C. It is the value of one element.

In addition, the reward V shows a higher value as the user's learning state is better (the higher the degree of concentration and understanding).

The value Q of the control policy a is the sum of the discount rewards V in the learning environment after the learning environment is changed according to the control policy a and the reward V that will be obtained thereafter (so-called Bellman equation). ..

Further, assuming that the control policy a still adopts the optimum control policy a (the value Q is maximized), the discount reward sum, which is the value Q of the control policy a, is expressed by the following equation (1) (1). The so-called Bellman optimal equation).

Q [t] = V [t + 1] + γQmax [t + 1] ... (1)

The time t is a time represented by the number of time steps, as in the case of the first processing example of the environmental data analysis unit 82. The value Q [t] represents the value Q of the control policy a when the learning environment is changed according to the control policy a with respect to the learning environment at a certain time t. The control policy a at time t is hereinafter represented by a [t]. The reward V [t + 1] represents the reward V in the learning environment after the learning environment is changed according to the control policy a [t] with respect to the learning environment at time t. Q [t + 1] represents the value Q of the control policy a [t + 1] when the learning environment at time t + 1 is changed according to the control policy a [t + 1], and is calculated by the learning model. Qmax [t + 1] represents the maximum value of each value Q [t + 1] of all control policies a [t + 1] at time t + 1. γ is a discount rate (γ is a value of 0 or more and 1 or less), which is a predetermined value.

The environment data analysis unit 82 receives the environment information E [t] from the learning environment sensing unit 81 acquired at a certain time t, the user state G [t] from the user state sensing unit 62, and the learning data analysis unit 63. The learning information C [t] is input to the learning model, and the value Q [t] of all the control policies a [t] is calculated as the output data of the learning model.

The learning model has output nodes corresponding to each of all control policies a, and each output node outputs, for example, a value in the range of 0 to 1. The output value from each output node of the learning model represents the value Q of the control policy a corresponding to each output node.

Then, the environmental data analysis unit 82 determines, for example, the control policy a [t] that maximizes the value Q [t] among all the value Q [t] of the control policy a [t] output from the learning model. The analysis result Ae [t] is supplied to the environment control unit 83.

Of all the value Q [t] of all control policies a [t] output from the learning model, the maximum value is represented by Qmax [t], and the control policy with the analysis result Ae [t] is ac [t]. It is represented by. Of all the control policy a [t] values Q [t] output from the learning model, the control policy a [t] that is not the maximum value Qmax [t] is frequently used as the analysis result Ae [t]. It may be adopted.

When the environment control unit 83 changes the learning environment based on the analysis result Ae [t], the environment data analysis unit 82 then receives the environment information E [t + 1] from the learning environment sensing unit 81 acquired at time t + 1. , The user state G [t + 1] from the user state sensing unit 62 and the learning information C [t + 1] from the learning data analysis unit 63 are input to the learning model, and all the control policies a which are the output data of the learning model. Each value Q [t + 1] of [t + 1] is calculated. Then, the environmental data analysis unit 82 controls the environment by using, for example, the control policy ac [t + 1] at which the value Q [t + 1] is maximized as the analysis result Ae at the time t + 1 among the value Q [t + 1] output from the learning model. It is supplied to the unit 83.

Further, the environmental data analysis unit 82 calculates the reward V [t + 1] based on the learning information C [t + 1]. As a result, the environmental data analysis unit 82 calculates the value Q [t] of the control policy ac [t], which is the analysis result Ae [t] at time t, by the above equation (1), and the calculated value Q [t]. Let be the correct answer value Q'[t] of the value of the control policy ac [t].

Then, when the environment data analysis unit 82 inputs the environment information E [t], the user state G [t], and the learning information C [t] at the time t into the learning model, the control policy ac output from the learning model. The learning model is trained so that the value Q [t] of [t] becomes the correct answer value Q'[t].

The environmental data analysis unit 82 acquires the environmental information E, the user state G, and the learning information C, calculates the analysis result Ae based on the environmental information E, the user state G, and the learning information C, and calculates the value Q. While repeating the above steps, the learning model is strengthened and learned. As a result, the learning model is gradually updated to a user-specific learning model.

The environment data analysis unit 82 acquires the correct answer value Q'[t] of the output data of the learning model for the input of the environment information E [t], the user state G [t], and the learning information C [t]. It is not necessary to train the learning model for each.

For example, the environment data analysis unit 82 has the environment information E [t], the user state G [t], the learning information C [t], and the correct answer value Q'[t] of the output data of the learning model for those inputs. And are stored as learning data in the environment control database 101 and stored.

The environmental data analysis unit 82 learns the learning model using the learning data accumulated every time a predetermined number of learning data are accumulated.

FIG. 10 is a flowchart illustrating the processing when the environment data analysis unit 82 calculates the analysis result Ae using the learning model in the second processing example.

In step S71, when the user first learns using the information processing device 11 (when the learning model reinforcement learning is not performed at all), the environment data analysis unit 82 initially starts from the environment control database 101. Read the learning model of. When the reinforcement learning of the learning model has already been performed, the environment data analysis unit 82 reads out the learning model in which the reinforcement learning has been performed from the environment control database 101. The process proceeds from step S71 to step S72.

In step S72, the environmental data analysis unit 82 sets the time t to 0. The process proceeds from step S72 to step S73.

In step S73, the environment data analysis unit 82 acquires the environment information E [t] from the learning environment sensing unit 81, acquires the user state G [t] from the user state sensing unit 62, and learns from the learning data analysis unit 63. Acquire information C [t]. The process proceeds from step S73 to step S74.

In step S74, the environmental data analysis unit 82 calculates a reward V [t] representing the goodness of the learning state of the user based on the learning information C [t] acquired in step S73. The process proceeds from step S74 to step S75. When the time t is 0, the process skips step S74 and proceeds to step S75.

In step S75, the environment data analysis unit 82 learns using the learning model with the environment information E [t], the user state G [t], and the learning information C [t] acquired in step S73 as input data. The analysis result Ae [t], which is the control policy ac [t] for the environment, is calculated. The process proceeds from step S75 to step S76.

In step S76, the environment data analysis unit 82 supplies the analysis result Ae [t] calculated in step S75 to the control signal generation unit 91 of the environment control unit 83, and changes the learning environment according to the analysis result Ae [t]. .. The process proceeds from step S76 to step S77.

In step S77, the maximum value Qmax [t] of the value Q [t] of each control policy output from the learning model in step S75 and the reward V [t] calculated in step S74 are added. The value Q [t-1] of the control policy ac [t-1], which is the analysis result Ae [t-1] in step S75 one hour before the step, is calculated. Then, the calculated value Q [t-1] is set as the correct value Q'[t-1] of the value with respect to the control policy ac [t-1]. The process proceeds from step S77 to step S78.

The value Q [t-1] calculated by the process of step S77 is a value calculated by the calculation formula in which the time t is replaced with the time t-1 in the above formula (1). When the time t is 0, the process skips step S77 and step S78 and proceeds to step S79.

In step S78, the environment data analysis unit 82 uses the environment information E [t-1], the user state G [t-1], and the learning information C [t-1] one hour before as input data as input data for the learning model. The learning model is trained so that the value Q [t-1] for the control policy ac [t-1] output from the learning model when inputting to is the correct answer value Q'[t-1]. Further, the environment data analysis unit 82 stores the trained learning model in the environment control database 101. The process proceeds from step S78 to step S79.

In step S79, the environment data analysis unit 82 determines whether or not the user's learning has been completed.

If it is determined in step S79 that the user's learning has not been completed, the process proceeds to step S80.

In step S80, the environmental data analysis unit 82 waits until the time for one hour step elapses from the time when the process of step S73 is started. Also, the time t represented by the number of time steps is incremented by 1. The process returns from step S80 to step S73, and steps S73 to S80 are repeated.

On the other hand, if it is determined in step S79 that the user's learning has been completed, the processing of this flowchart ends.

According to the second processing example of the environment data analysis unit 82, the environment data analysis unit 82 supplies the environment information E supplied from the learning environment sensing unit 81, the user state G supplied from the user state sensing unit 62, and the user state G. Based on the learning information C supplied from the learning data analysis unit 63, a control policy for improving the learning state of the user is calculated using the learning model, and the user's taste and personality are taken into consideration without the user's trouble. Appropriate control of the learning environment will be performed.

In the information processing device 11 of FIG. 1, the information processing unit 12 is a server device connected to an information terminal (smartphone, personal computer, etc.) deployed in the user's learning space by a communication line such as the Internet. May be good. In this case, the input unit and the output unit provided in the information terminal function as a substitute for the input unit 24 and the output unit 25 of FIG. Further, the information terminal functions as a device that mediates the exchange of information between the various sensors 13 and the various environmental control devices 14 and the server device.

The present technology can also have the following configurations.
<1> An information processing device having a processing unit that calculates changes to the learning environment of the user based on the learning state of the user and improves the learning state.
<2> The information processing device according to <1>, wherein the learning state represents the quality of the user's learning.
<3> The information processing apparatus according to <1> or <2>, wherein the learning state includes any one or more of the user's concentration level, comprehension level, and learning speed with respect to learning.
<4> The processing unit calculates the change content for the learning environment based on the evaluation value calculated based on any one or more of the concentration level, the comprehension level, and the learning speed. The information processing device according to <3>.
<5> The information processing apparatus according to any one of <1> to <4>, wherein the processing unit calculates the change contents with respect to the learning environment based on the current state of the learning environment.
<6> The state of the learning environment is any of sound, image, illuminance, temperature, humidity, atmospheric pressure, open / closed state of windows or doors, clutter of the room, presence / absence of others, weather, and time. The information processing apparatus according to <5>, which is in one or more states.
<7> The information processing device according to any one of <1> to <6>, wherein the processing unit calculates the change contents with respect to the learning environment based on the state of the user.
<8> The state of the user is one or more of the states related to the position, behavior, orientation, pulse, sweating, brain wave, touch, smell, and taste of the user <7>. The information processing device described.
<9> The information processing apparatus according to any one of <1> to <8>, wherein the processing unit calculates the changed content with respect to the learning environment using a learning model trained by machine learning.
<10> The information processing apparatus according to <9>, wherein the processing unit learns the learning model based on the learning data collected when the learning environment of the user is changed and the learning state of the user is improved.
<11> The processing unit calculates the value of each of the changes that can be taken for the learning environment using the learning model, and determines the changes for the learning environment based on the values. The information processing apparatus according to 9>.
<12> The information processing apparatus according to any one of <9> to <11>, wherein the learning model is a deep neural network.
<13> The information processing apparatus according to any one of <1> to <12>, further comprising a problem generation unit that presents a problem corresponding to the learning state of the user to the user.
<14> The information processing apparatus according to any one of <1> to <13>, further comprising an environment control unit that changes the learning environment based on the change contents with respect to the learning environment calculated by the processing unit.
<15> An information processing method in which the processing unit of the information processing device including the processing unit calculates, based on the learning state of the user, the changed content for the learning environment of the user and the changed content for which the learning state is improved. ..
<16> Computer
A program for functioning as a processing unit for calculating the changed contents for the learning environment of the user and improving the learning state based on the learning state of the user.

11 information processing device, 12 information processing unit, 13 various sensors, 14 various environment control devices, 21 CPU, 24 input unit, 25 output unit, 41 learning content control unit, 42 learning environment control unit, 61 user interface unit, 62 users State sensing unit, 63 learning data analysis unit, 64 problem generation unit, 81 learning environment sensing unit, 82 environment data analysis unit, 83 environment control unit, 91 control signal generation unit, 101 environment control database

Claims

An information processing device having a processing unit that calculates changes to the learning environment of the user based on the learning state of the user and improves the learning state.
The information processing device according to claim 1, wherein the learning state represents the quality of the learning of the user.
The information processing apparatus according to claim 1, wherein the learning state includes any one or more of the user's concentration level, comprehension level, and learning speed with respect to learning.
The processing unit calculates the change contents for the learning environment based on the evaluation value calculated based on the information of any one or more of the concentration level, the comprehension level, and the learning speed. Item 3. The information processing apparatus according to item 3.
The information processing device according to claim 1, wherein the processing unit calculates the changed content of the learning environment based on the current state of the learning environment.
The state of the learning environment is any one of sound, image, illuminance, temperature, humidity, atmospheric pressure, window or door open / closed state, room clutter, presence / absence of others, weather, and time. Or the information processing apparatus according to claim 5, which is in a plurality of states.
The information processing device according to claim 1, wherein the processing unit calculates the change contents with respect to the learning environment based on the state of the user.
The information according to claim 7, wherein the state of the user is any one or more of states related to the position, behavior, orientation, pulse, sweating, brain wave, touch, smell, and taste of the user. Processing equipment.
The information processing device according to claim 1, wherein the processing unit calculates the change contents with respect to the learning environment by using a learning model trained by machine learning.
The information processing device according to claim 9, wherein the processing unit learns the learning model based on the learning data collected when the learning environment of the user is changed and the learning state of the user is improved.
The processing unit calculates the value of each of the changes that can be made to the learning environment using the learning model, and determines the changes to the learning environment based on the values. The information processing device described.
The information processing device according to claim 9, wherein the learning model is a deep neural network.
The information processing device according to claim 1, further comprising a problem generation unit that presents a problem corresponding to the learning state of the user to the user.
The information processing apparatus according to claim 1, further comprising an environment control unit that changes the learning environment based on the content of the change to the learning environment calculated by the processing unit.
An information processing method in which the processing unit of an information processing device including the processing unit calculates, based on the learning state of the user, the content of change to the learning environment of the user and the content of the change for which the learning state is improved.
Computer,
A program for functioning as a processing unit for calculating the changed contents for the learning environment of the user and improving the learning state based on the learning state of the user.