WO2020246600A1 - 学習装置、空間制御装置、学習プログラムおよび空間制御プログラム - Google Patents
学習装置、空間制御装置、学習プログラムおよび空間制御プログラム Download PDFInfo
- Publication number
- WO2020246600A1 WO2020246600A1 PCT/JP2020/022388 JP2020022388W WO2020246600A1 WO 2020246600 A1 WO2020246600 A1 WO 2020246600A1 JP 2020022388 W JP2020022388 W JP 2020022388W WO 2020246600 A1 WO2020246600 A1 WO 2020246600A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- spatial environment
- speaker
- conversation
- satisfaction level
- input data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/10—Office automation; Time management
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/10—Speech classification or search using distance or distortion measures between unknown speech and reference templates
Definitions
- the present invention relates to a learning device, a space control device, a learning program, and a space control program.
- a person may learn the relationship between the conversational air and the spatial environment from the experience of improving or deteriorating the conversational air (situation, atmosphere, etc.). This allows a person to control the spatial environment when talking to a person.
- the control of the spatial environment performed by humans includes opening and closing curtains, adjusting the brightness of the spatial environment such as adjusting the brightness of lighting, and selecting music to be played in the spatial environment.
- Patent Document 1 discloses a correlation strength table in which the correlation strengths representing the degree of correlation between a color and a predetermined word are associated with each other.
- an object of the present invention is to provide a learning device, a spatial control device, a learning program, and a spatial control program that enable control of the spatial environment in consideration of the air of conversation.
- the first feature of the present invention is an input in which the amount of speech in a conversation by a plurality of speakers at a predetermined time, the feature value of the conversation content, and the feature value of the spatial environment in which the speaker exists are associated with each other.
- the present invention relates to a learning device including an input data generation unit for generating data and a model learning unit for learning a spatial environment model showing a correlation between satisfaction including speech volume, feature values of conversation content, and feature values of spatial environment.
- the input data further includes a body movement synchro value that indexes the synchronization of movements by a plurality of speakers, and the satisfaction level may further include a body movement synchro value.
- the input data may further include the number of nods by a plurality of speakers, and the satisfaction level may further include the number of nods.
- the input data may further include the amount of laughter by a plurality of speakers, and the satisfaction level may further include the amount of laughter.
- the characteristic value of the spatial environment may be the color or brightness of the spatial lighting.
- the characteristic value of the spatial environment may be the characteristic value of the sound of the spatial environment.
- the second feature of the present invention is a spatial environment model showing the correlation between the satisfaction including the amount of speech in a conversation by a plurality of speakers at a predetermined time, the characteristic value of the conversation content, and the characteristic value of the spatial environment in which the speaker exists.
- a storage device that stores the spatial environment model data to be specified, a current state analysis unit that calculates the current satisfaction level from the characteristic values of the current conversation content and the amount of speech in the current conversation by the speaker, and the characteristics of the current conversation content.
- Concerning a spatial control device including a spatial environment control unit that calculates a characteristic value of a spatial environment that raises the satisfaction level higher than the current satisfaction level from the value and the spatial environment model and controls the spatial environment based on the calculated feature value. ..
- the third feature of the present invention is an input in which a computer generates input data in which the amount of speech in a conversation by a plurality of speakers at a predetermined time, the feature value of the conversation content, and the feature value of the spatial environment in which the speaker exists are associated with each other.
- the present invention relates to a learning program that functions as a model learning unit that learns a spatial environment model that shows a correlation between a data generation unit and a satisfaction level including the amount of speech, a characteristic value of conversation content, and a characteristic value of a spatial environment.
- the fourth feature of the present invention is a spatial environment model showing the correlation between the satisfaction including the amount of speech in a conversation by a plurality of speakers at a predetermined time, the characteristic value of the conversation content, and the characteristic value of the spatial environment in which the speaker exists.
- a computer that stores the specified spatial environment model data, a current analysis unit that calculates the current satisfaction level from the characteristic value of the current conversation content and the amount of speech in the current conversation by the speaker, and the characteristic value of the current conversation content.
- the spatial control program that calculates the feature value of the spatial environment that raises the satisfaction level higher than the current satisfaction level from the spatial environment model and makes it function as the spatial environment control unit that controls the spatial environment based on the calculated feature value. ..
- a learning device a spatial control device, a learning program, and a spatial control program that enable control of a spatial environment in consideration of the air of conversation.
- space means a place where a conversation is taking place.
- Spaces are, for example, offices, educational sites, houses, commercial facilities such as hotels, restaurants, department stores, hospitals, nursing care facilities, and various other places where speakers gather.
- Sounder means a person who speaks.
- a speaker includes a person who is speaking at a specific timing as well as a person who is listening.
- the speaker may be a person who discusses one theme, such as a person who participates in a discussion, or a person who happens to be present in a hotel lounge or the like.
- the learning device 1 learns the relationship between a conversation by a plurality of people and the spatial environment in which the conversation takes place, and generates a spatial environment model. Further, the learning device 1 refers to the generated spatial environment model and controls the spatial environment so as to improve the satisfaction level in the conversation.
- the learning device 1 is connected to various devices by wire or wirelessly as shown in FIG.
- the device shown in FIG. 1 as a device connected to the learning device 1 is an example. Only some of the devices shown in FIG. 1 may be connected to the learning device 1. A device other than the device shown in FIG. 1 may be connected to the learning device 1. Further, in another embodiment, each device connected to the learning device 1 in FIG. 1 outputs data to the learning device 1 via a storage medium such as a USB (Universal Serial Bus) memory without connecting to the learning device 1. Input / output may be performed.
- USB Universal Serial Bus
- the device connected to the learning device 1 may be divided into a conversation, a device for identifying the state of the speaker, and a device for controlling the spatial environment.
- the device for identifying the conversation and the state of the speaker is a device for grasping the current state of the space.
- the device for identifying the conversation and the state of the speaker is, for example, a microphone 41, a body motion sensor 42, a heartbeat fluctuation sensor 43, and the like, as shown in FIG.
- the data acquired by each device is input to the learning device 1 and used to identify the conversation and the state of the speaker.
- a device for identifying the conversation and the state of the speaker a device for identifying the spatial environment in which the speaker is present, such as a temperature / humidity meter and an illuminance meter, may be included.
- the device for controlling the spatial environment is a device that can change the spatial environment.
- the devices for controlling the spatial environment are, for example, a speaker 51, an aroma shooter 52, a lighting controller 53, and a projector 54, as shown in FIG.
- the learning device 1 determines a spatial environment for improving conversation satisfaction from the spatial environment model, and controls each device based on the determined spatial environment.
- an air conditioner or the like may be included as a device for controlling conversation and the spatial environment.
- the microphone 41 acquires voice data of conversation in space.
- one microphone 41 acquires the utterance of each speaker.
- one microphone may be provided for each speaker, and each microphone may acquire the utterance of each speaker individually.
- the body movement sensor 42 detects the body movement of the speaker.
- the body motion sensor 42 is provided for each speaker.
- the body motion sensor 42 detects the movement of the speaker during conversation.
- the body motion sensor 42 is, for example, a motion sensor provided on the collar of the speaker.
- the heartbeat fluctuation sensor 43 is a sensor that detects heartbeat fluctuation as an index of stress of the speaker.
- the heartbeat fluctuates due to the influence of the periodic activity of the brain stem synchronized with respiration and the periodic activity and emotion of the brain stem synchronized with the fluctuation of blood pressure.
- the periodic activity of the brain stem synchronized with respiration is 0.04 to 0.15 Hz, that is, the periodic activity of the LF (Low Frequency) component represented by a cycle of about 10 seconds.
- the periodic activity of the brain stem synchronized with the fluctuation of blood pressure is 0.15 to 0.4 Hz, that is, HF (High Frequency) periodic activity represented by a cycle of about 4 seconds.
- the sympathetic nerve is an amplification factor that increases the power of LF
- the parasympathetic nerve is an amplification factor that increases the power of both LF and HF. Therefore, by detecting the fluctuation of the heartbeat, it becomes possible to grasp the states of the sympathetic nerve and the parasympathetic nerve.
- the heartbeat fluctuation sensor 43 is provided for each speaker, and inputs the state of the heartbeat fluctuation of each speaker to the learning device 1.
- the measurement result by the heartbeat fluctuation sensor 43 may be input to the learning device 1, and the heartbeat fluctuation may be detected by the learning device 1.
- the speaker 51 outputs sound to the space.
- the speaker 51 outputs the sound specified by the learning device 1.
- the sound may be music or an environmental sound such as the sound of flowing water. Further, the sound is not limited to a sound that is constantly output such as music or an environmental sound, but may be a sound that is suddenly output such as a notification sound.
- the aroma shooter 52 is a device that outputs a scent to the space.
- the aroma shooter 52 outputs the scent designated by the learning device 1.
- the lighting controller 53 controls the brightness and color of the lighting fixtures provided in the space.
- the lighting controller 53 controls the brightness and color of the luminaire so as to obtain the brightness and color specified by the learning device 1.
- the lighting controller 53 may control one of the brightness and color of the luminaire provided in the space.
- the projector 54 displays an image on a wall of space or the like.
- the projector 54 displays an image designated by the learning device 1.
- the projector 54 may display an image by using a projection mapping technique.
- the learning device 1 is a general computer including a storage device 10, a processing device 20, and an input / output interface 30.
- the function shown in FIG. 1 is realized by executing a learning program by a general computer.
- the storage device 10 is a ROM (Read Only Memory), a RAM (Random access memory), a hard disk, or the like, and stores various data such as input data, output data, and intermediate data for the processing device 20 to execute processing. ..
- the processing device 20 is a CPU (Central Processing Unit) that reads and writes data stored in the storage device 10 and inputs and outputs data to and from the input / output interface 30 to execute processing in the learning device 1. ..
- the input / output interface 30 inputs data input from an input device (not shown) such as a mouse or keyboard to the processing device 20, and outputs data output from the processing device 20 to an output device such as a printer or display device (FIG.). (Not shown) is output.
- an input device such as a mouse or keyboard
- an output device such as a printer or display device (FIG.). (Not shown) is output.
- the input / output interface 30 is further connected to a conversation, a device for identifying the state of the speaker, a device for controlling the spatial environment, and the like, as described above.
- the device for identifying the conversation and the state of the speaker and the device for controlling the spatial environment include a microphone 41, a body motion sensor 42, a heartbeat fluctuation sensor 43, a speaker 51, an aroma shooter 52, and lighting.
- This is an interface for connecting to each device such as the controller 53 and the projector 54.
- Devices not shown in FIG. 1 such as a temperature / hygrometer, a luminometer, and an air conditioner may be included as a device for identifying the conversation and the state of the speaker and a device for controlling the spatial environment.
- the storage device 10 stores the learning program, and also stores the input data 11, the spatial environment model data 12, the current status data 13, the current status satisfaction data 14, and the spatial environment feature value data 15.
- the input data 11 is data that associates the conversational situation of the speaker, which changes with time, with the spatial environment.
- the input data 11 associates, for example, the amount of utterance in a conversation by a plurality of speakers at a predetermined time, the characteristic value of the conversation content, and the characteristic value of the spatial environment in which the speaker exists.
- the input data 11 is stored by the input data generation unit 21.
- the spatial environment model data 12 is data that identifies the spatial environment model learned by the model learning unit 22.
- the spatial environment model data 12 is stored by the model learning unit 22 and referred to by the spatial environment control unit 25.
- the spatial environment model shows the correlation between the satisfaction level including the amount of speech, the characteristic value of the conversation content, and the characteristic value of the spatial environment.
- the spatial environment model may also show correlations with other factors.
- the spatial environment model data only needs to be able to specify the spatial environment model, and has a data format corresponding to a learning method or the like.
- the current data 13 is the data of the feature values of the spatial environment before the space is controlled by referring to the spatial environment model.
- the current status data 13 is stored by the current status data acquisition unit 23 and referred to by the spatial environment control unit 25.
- the current state satisfaction data 14 is data related to the satisfaction level of conversation by the speaker before the space is controlled by referring to the spatial environment model.
- the current status satisfaction data 14 is stored by the current status analysis unit 24 and referred to by the spatial environment control unit 25.
- the spatial environment feature value data 15 is the data of the spatial environment feature values calculated by the spatial environment control unit 25.
- the spatial environment feature value data 15 is stored by the spatial environment control unit 25.
- the spatial environment feature value data 15 is input to a device for changing the space, such as a speaker shown in FIG.
- the processing device 20 includes an input data generation unit 21, a model learning unit 22, a current data acquisition unit 23, a current status analysis unit 24, and a spatial environment control unit 25.
- the input data generation unit 21 acquires data from each device connected to the learning device 1 via the input / output interface 30, generates input data 11, and stores it in the storage device 10.
- the input data generation unit 21 generates the input data 11 from the data for specifying the state of the conversation and the speaker and the data for specifying the state of the spatial environment in which the speaker exists.
- the input data generation unit 21 sequentially acquires data from a device provided in the space.
- the input data generation unit 21 aggregates the data acquired from each device at predetermined time intervals, and sets the data for specifying the conversation and the state of the speaker at each predetermined time and the state of the spatial environment in which the speaker exists. Calculate the data to be identified.
- the predetermined time may be set for each data type included in the input data 11. For example, the amount of utterance may be totaled every 30 seconds, and the feature value of the conversation content may be totaled every minute.
- the input data generation unit 21 converts the data aggregated at predetermined time intervals into data that can be input to the model learning unit 22 to generate the input data 11.
- the data for identifying the conversation and the state of the speaker are the characteristic values of the amount of utterance and the content of the conversation in the conversation by a plurality of speakers at a predetermined time.
- the data for identifying the conversation and the state of the speaker may further include the body motion synchronization value of the plurality of speakers, the number of nods by the plurality of speakers, and the amount of laughter by the plurality of speakers.
- the amount of utterance is the amount of conversation by each speaker.
- the utterance amount is, for example, the amount of voice measured by the microphone 41.
- the amount of voice is, for example, the total speech time.
- the input data generation unit 21 specifies the utterance portion of the voice data acquired from the microphone 41, and calculates the total utterance time of each speaker as the utterance amount at predetermined time intervals.
- the input data generation unit 21 includes the utterance amount for each predetermined time in the input data 11.
- the feature value of the conversation content is calculated based on the conversation content specified by voice recognition from the voice measured by the microphone 41.
- the input data generation unit 21 identifies the conversation content by each speaker by voice recognition from the voice data of the predetermined time acquired from the microphone 41, and calculates the feature value of the conversation content for each predetermined time.
- the characteristic value of the conversation content is calculated from the characteristic value of the word used in the conversation.
- the feature value of a word may be specified from the correlation strength between the word and the color, for example, as shown in Patent Document 1.
- the feature value of the word may be represented by a vector using the adjective pair shown in FIG. 2 as an index.
- This vector may be calculated from the relationship between the phonological characteristics of a word and the evaluation scale, for example, as described in Japanese Patent No. 5678836.
- the feature value of the conversation content is calculated from the words used in the conversation has been described, but the present invention is not limited to this.
- the meaning content of the conversation may be specified, and the feature amount of the conversation content may be calculated from the meaning content.
- the conversation input data generation unit 21 includes the feature value of the conversation content at predetermined time intervals in the input data 11.
- the body movement synchronization value of a plurality of speakers is a value that indexes the synchronization of movements by a plurality of speakers at predetermined time intervals.
- the body movement synchronization value is high when the body movements of each speaker are synchronized, and is low when the body movements of each speaker are not synchronized.
- the input data generation unit 21 acquires the body movement of each speaker from the body movement sensor 42, and calculates an index such as a body movement synchronization value.
- the body movement sensor 42 outputs the transition of the body movement of each speaker, for example, as shown in FIGS. 3A to 3C.
- the input data generation unit 21 compares the transition of the body movements of each participant shown in FIG. 3 at predetermined time intervals.
- the input data generation unit 21 calculates the body movement synchronization value by indexing the degree of synchronization of the body movements of each speaker at predetermined time intervals.
- the input data generation unit 21 includes the body movement synchro value for each predetermined time in the input data 11.
- the number of nods by multiple speakers is the total number of nods of each speaker at a predetermined time.
- the input data generation unit 21 totals the number of nods of each speaker at a predetermined time based on the data acquired from the body motion sensor 42, and calculates the number of nods by a plurality of speakers.
- the input data generation unit 21 includes the number of nods for each predetermined time in the input data 11.
- the amount of laughter by multiple speakers is the total laughter time of each speaker in a predetermined time.
- the input data generation unit 21 calculates the amount of laughter by a plurality of speakers by totaling the laughter time for each predetermined time based on the voice data acquired from the microphone 41.
- the input data generation unit 21 includes the amount of laughter at predetermined time intervals in the input data 11.
- Data other than the above may be included as data for identifying the conversation and the state of the speaker.
- the input data generation unit 21 may include the sigh amount, the heart rate for a predetermined time acquired from the biological sensor, the body temperature, the blood pressure, the data acquired from the heart rate fluctuation sensor 43, and the like in the input data 11.
- the heart rate is the heart rate of the speaker at a predetermined time.
- the heart rate is obtained, for example, from a sensor that detects the heart rate.
- Body temperature and blood pressure are obtained from biosensors installed in the speaker.
- the sigh amount can be calculated based on the voice data acquired from the microphone 41, similarly to the laughter amount described above.
- the data acquired from the heart rate fluctuation sensor 43 is the data of LF / HF, which is an amplification factor of the parasympathetic nerve, and LF, which is an amplification factor of the sympathetic nerve, for each hour.
- the graph shown in FIG. 4 is data of a heart rate fluctuation sensor 43 provided for one speaker.
- the input data generation unit 21 may include the data of the heart rate fluctuation sensor 43 provided for each speaker in the input data 11.
- the data that identifies the state of the spatial environment in which the speaker exists is a characteristic value of the spatial environment.
- the characteristic values of the spatial environment are the temperature and humidity of the spatial environment, the color or brightness of the spatial lighting, the characteristic value of the sound of the spatial environment, the characteristic value of the fragrance, the characteristic value of the image projected in the spatial environment, and the wind direction of the spatial environment. And the air volume, etc.
- Each feature value of the spatial environment is measured at predetermined time intervals and is included in the input data 11.
- the input data generation unit 21 may generate data for specifying the state of the spatial environment from the data acquired from the device for controlling the spatial environment and include it in the input data 11.
- the characteristic value of the sound in the spatial environment is the characteristic value of the music flowing in the spatial environment, and specifically, the mel frequency for each hour.
- the input data generation unit 21 converts the music flowing in the space into a mel frequency every hour and includes it in the input data 11. Further, the input data generation unit 21 may specify the music flowing in the space based on the data acquired from the microphone 41, convert it into data such as volume and tempo, and include it in the input data 11.
- the music flowing in the space may be music acquired from the microphone 41, or may be music in which the learning device 1 instructs the speaker to output the music.
- the sound of the spatial environment includes, as described above, environmental sounds such as the sound of flowing water, sounds generated sporadically such as sound effects, sounds output by the speaker 51, sounds generated in the vicinity of the space, and the like. It may contain various sounds that can be perceived in the spatial environment.
- the characteristic value of the spatial environment may be an instruction value for the learning device 1 to control each device.
- the color or brightness of the spatial illumination which is a feature value of the spatial environment, may be an indicated value of the color or brightness input by the learning device 1 to the illumination controller.
- the characteristic value of the music flowing in the spatial environment may be specified from the music input by the learning device 1 to the speaker 51.
- the characteristic value of the scent may be specified from the scent identifier input by the learning device 1 to the aroma shooter 52.
- the feature value of the image may be specified from the identifier of the image input by the learning device 1 to the projector 54.
- the input data 11 generated by the input data generation unit 21 is referred to by the model learning unit 22.
- the model learning unit 22 learns the spatial environment model from the input data 11.
- the spatial environment model shows the correlation between conversation satisfaction, conversation content feature values, and spatial environment feature values.
- the spatial environment model shows how the satisfaction level of a conversation changes as the situation of the space changes or the content of the conversation changes.
- the model learning unit 22 As the learning method in the model learning unit 22, any algorithm such as machine learning or deep learning is used. When deep learning is used, the current situation is input to the first layer, and the output layer outputs a prediction of the future situation. As the intermediate layer, for example, 14 or more layers may be used. In order to avoid overfitting, the model learning unit 22 may use a dropout. The model learning unit 22 stores the spatial environment model data 12 that identifies the spatial environment model obtained by learning in the storage device 10.
- model learning process by the model learning unit 22 will be described with reference to FIG.
- step S11 the model learning unit 22 refers to the input data 11 and calculates the satisfaction level of the conversation at that time from the amount of utterances at each predetermined time.
- step S12 the model learning unit 22 learns the correlation between the satisfaction level of the conversation calculated in step S11, the feature value of the conversation content at that time, and the feature value of the spatial environment, and generates a spatial environment model.
- the satisfaction level of conversation is calculated based on the amount of utterance.
- "conversational air" is expressed as an index of conversation satisfaction based on the amount of utterance. For example, when there is a large amount of utterance in the field where a discussion is held, it is considered that opinions are actively exchanged, so it is decided that the satisfaction level of the conversation is high. In addition, when the amount of conversation is small, it is considered that sufficient exchange of opinions has not been made, so it is determined that the satisfaction level of the conversation is low. If the amount of conversation is too large, it is considered that each speaker speaks without listening to the opinions of others, so that the satisfaction level of the conversation is determined to be low.
- the method of determining the satisfaction level of the conversation shown here is an example, and is not limited to this.
- the model learning unit 22 may calculate the satisfaction level of the conversation in consideration of not only the total amount of utterances of each speaker but also the balance of the amount of utterances of each speaker.
- the method of calculating the satisfaction level of conversation may be appropriately changed according to the characteristics of the space. For example, in a conference room where discussions are held, it is considered that the greater the amount of utterance, the higher the satisfaction level of the conversation. On the other hand, in hospital waiting rooms, hotel lounges, etc., it is considered that conversation satisfaction is higher when the amount of utterance is smaller.
- a function showing the correlation between the utterance amount and the conversation satisfaction level may be prepared for each space, and the model learning unit 22 may calculate the conversation satisfaction level from the utterance amount by referring to the function according to the space.
- the model learning unit 22 may further use the body movement synchronization value, the number of nods, the amount of laughter, the fluctuation of the heartbeat, the amount of sigh, etc. as the satisfaction level of the conversation.
- the body movement synchronization value when the body movement synchronization value is high, it is considered that each speaker performs the same movement and empathy is embodied, so that the satisfaction level of the conversation is calculated to be high.
- the body movement synchro value when the body movement synchro value is low, it is considered that each speaker performs the same movement and empathy is embodied, so that the satisfaction level of the conversation is calculated to be high.
- the satisfaction level of the conversation is calculated to be low. It is calculated so that when the stress is low, the satisfaction level of the conversation is high.
- the satisfaction level of the conversation may be calculated as appropriate, for example, the satisfaction level of the conversation may be calculated based on a predetermined function.
- the model learning unit 22 may calculate the satisfaction level of the conversation by referring to the function according to the space, as in the case of calculating the satisfaction level of the conversation based on the amount of utterance.
- the spatial environment model generated by the model learning unit 22 is referred to by the spatial environment control unit 25.
- the spatial environment control unit 25 refers to the spatial environment model and controls the space so that the “air of conversation”, specifically, the satisfaction level of the conversation is high.
- the current state data acquisition unit 23 and the current state analysis unit 24 grasp the current state in the controlled target space.
- data other than the amount of utterance may be treated as arbitrary data.
- Arbitrary data includes, for example, body movement synchronization value, number of nods, amount of laughter, heartbeat fluctuation, temperature and humidity of spatial environment, color or brightness of spatial lighting, characteristic value of music flowing in spatial environment, characteristic value of fragrance, space. These are the feature values of the image projected in the environment.
- the model learning unit 22 generates a spatial environment model in consideration of the fact that these arbitrary data cannot be acquired and cannot be controlled when referring to the spatial environment model. For example, in consideration of the space where the temperature / humidity meter is not installed or the space where the temperature / humidity cannot be controlled, a dummy variable indicating whether or not the temperature / humidity can be applied may be set in the input data 11.
- the current status data acquisition unit 23 acquires the current status data in the space to be controlled by the spatial environment control unit 25.
- the current data acquisition unit 23 generates the current data 13 and stores the current data 13 in the same manner as the input data generation unit 21 acquires data from each device connected to the learning device 1 and generates the input data 11. Store in 10.
- the current state analysis unit 24 refers to the current state data 13 and calculates the current satisfaction level from the feature value of the current conversation content and the amount of utterance in the current conversation by the speaker.
- the current state analysis unit 24 calculates the feature value of the conversation content and the utterance amount of the conversation from the voice data acquired from the microphone 41 provided in the control target space.
- the current state analysis unit 24 further calculates the current satisfaction level from the calculated utterance amount.
- the current state analysis unit 24 calculates the feature value of the conversation content and the satisfaction level of the conversation in the same manner as the processing in the input data generation unit 21. At this time, the current state analysis unit 24 may further use the body movement synchronization value, the number of nods, the amount of laughter, the heartbeat fluctuation, and the like as the satisfaction level of the conversation.
- the spatial environment control unit 25 calculates the characteristic value of the spatial environment that raises the satisfaction level higher than the current satisfaction level from the characteristic value of the current conversation content and the spatial environment model.
- the spatial environment control unit 25 controls the spatial environment based on the calculated feature values.
- the spatial environment model shows how the satisfaction level of conversation changes as the situation of space and the content of conversation change. Therefore, the spatial environment control unit 25 refers to the spatial environment model and identifies the spatial environment in which the satisfaction level of the conversation is higher than the present from the current conversation content.
- the spatial environment control process by the spatial environment control unit 25 will be described with reference to FIG.
- step S21 the spatial environment control unit 25 acquires the current satisfaction level calculated by the current state analysis unit 24.
- step S22 the spatial environment control unit 25 acquires the feature value of the current conversation content calculated by the current state analysis unit 24.
- step S23 the spatial environment control unit 25 refers to the spatial environment model and calculates a control value of the spatial environment so that the satisfaction level is higher than the current satisfaction level acquired in step S21.
- step S24 the spatial environment control unit 25 controls the device for controlling the spatial environment according to the control value calculated in step S23.
- the spatial environment control unit 25 inputs the feature value of the current conversation content calculated by the current state analysis unit 24 into the spatial environment model so that the satisfaction level is higher than the current satisfaction level calculated by the current state analysis unit 24.
- the characteristic value of the spatial environment is calculated.
- the target satisfaction level may be set as long as the satisfaction level is higher than the current satisfaction level.
- the target satisfaction level may be indicated by a fixed value.
- the target satisfaction level may be expressed as a ratio to the current satisfaction level, such as 150% with respect to the current satisfaction level. Satisfaction is calculated from the amount of utterance, body movement synchronization value, number of nods, amount of laughter, heart rate fluctuation, and the like.
- the spatial environment control unit 25 controls the device for controlling the spatial environment based on the calculated control value of the spatial environment.
- the spatial environment control unit 25 calculates the color or brightness of the spatial lighting, the characteristic value of the music played in the spatial environment, and the like as conditions for increasing the current satisfaction level.
- the spatial environment control unit 25 inputs the calculated color or brightness of the spatial illumination to the illumination controller 53. This allows the space to be changed to a color or brightness that makes the conversation more satisfying.
- the spatial environment control unit 25 inputs music data corresponding to the characteristic values of the music flowing in the spatial environment to the speaker 51. As a result, music that enhances the satisfaction of conversation can be played in the space.
- the spatial environment control unit 25 may calculate the scent of the space, the image displayed by the projector 54, and the like as conditions for increasing the current satisfaction level.
- the spatial environment control unit 25 inputs the calculated scent of the space to the aroma shooter 52. This makes it possible to change the space to a scent that enhances the satisfaction of conversation.
- the spatial environment control unit 25 inputs the calculated image to the projector 54. As a result, it is possible to display an image in the space where the satisfaction level of the conversation is high.
- the learning device 1 it is possible to control the spatial environment in consideration of the air of conversation.
- the learning device 1 can change the BGM, temperature, humidity, scent, lighting, etc. of the space in which the conversation takes place, and can provide a space in which the satisfaction of the conversation is high.
- the input data 11 is data that associates the amount of speech, the characteristic value of the conversation content, the characteristic value of the spatial environment, and the like at a predetermined time.
- the spatial environment model is obtained by learning the input data 11, and shows the correlation between the satisfaction level, the characteristic value of the conversation content, and the characteristic value of the spatial environment.
- the variations listed here are examples and are not limited to these.
- the input data 11 may further include the emotions of the speaker, and the satisfaction level of the spatial environment model may further include the emotions of the speaker.
- the emotions of the speaker may be divided into positive emotions such as laughing and negative emotions such as anger. In the case of positive emotions, it is calculated so that the satisfaction level is high, and in the case of negative emotions, it is calculated so that the satisfaction level is low.
- an annotation model showing the correlation between the emotion and the conversation content which is generated by using the emotion attached to the conversation as teacher data
- Annotation model is generated before learning the spatial environment model.
- the teacher data is generated by attaching the feelings of the speaker to the conversation contents by the speaker himself or the worker in advance.
- an annotation model showing the correlation between emotions and conversation content is generated.
- the conversation content may be converted into conversation features by the adjective scale shown in FIG.
- the annotation model may show a correlation not only with the conversation content but also with other indexes such as the spoken voice of the speaker by learning.
- the input data generation unit 21 acquires the conversation content of the speaker, it can query the annotation model and generate the emotion of the speaker by computer processing. Further, when spatially controlling the space with reference to the spatial control model, the current state analysis unit 24 can acquire the feelings of the speaker from the content of the conversation currently in progress and calculate the satisfaction level.
- the input data 11 may further include the feature amount of the spoken voice of the speaker, and the satisfaction level of the spatial environment model may further include the feature amount of the spoken voice.
- the feature amount of the spoken voice is the tone of the voice of the speaker, specifically, the mel frequency.
- the conversational atmosphere of the speaker can be indexed by the feature amount of the spoken voice.
- the input data 11 may further include the heartbeat of the speaker, and the satisfaction of the spatial environment model may further include the heartbeat.
- the state of the speaker such as whether the person is in a relaxed state or a stressed state. For example, when the heart rate falls within a predetermined range of low normal values, it indicates a relaxed state, and when the heart rate is high and exceeds the predetermined range, it indicates a stress state.
- the satisfaction level is calculated to be high, and when the stress state is indicated, the satisfaction level is calculated to be low.
- the input data 11 may further include the heartbeat fluctuation of the speaker, and the satisfaction level of the spatial environment model may further include the heartbeat fluctuation. From the fluctuation of the heartbeat, it is possible to grasp the state of the speaker such as whether the person is in a relaxed state or a stressed state. When the speaker is in a relaxed state, the satisfaction level is calculated to be high, and when the speaker is in a stress state, the satisfaction level is calculated to be low.
- the input data 11 may further include the image data of the speaker, and the satisfaction level of the spatial environment model may further include the image data of the speaker.
- the image data is acquired from the video data of the speaker taken by the camera.
- the facial expression of the speaker is determined from the image data of the speaker, and in the case of a positive facial expression, the satisfaction level is calculated to be high, and in the case of a negative facial expression, the satisfaction level is calculated to be low. For example, when the speaker has a laughing facial expression, the satisfaction level is calculated to be high. If the speaker has an angry facial expression, the satisfaction level is calculated to be low.
- the model learning unit 22 may calculate the satisfaction level in consideration of the stress level of the speaker and generate a spatial environment model. Specifically, when the stress of the speaker is high, the satisfaction level is calculated to be low, and when the stress of the speaker is low, the satisfaction level is calculated to be high.
- the stress level of the speaker is grasped from, for example, heartbeat, heartbeat fluctuation, image data of the speaker, and the like.
- the spatial control model shows the correlation between the satisfaction level including the amount of speech, the stress level of the speaker, the characteristic value of the conversation content, and the characteristic value of the spatial environment.
- the learning device 1 may display data input / output when learning the spatial environment model and when controlling the space using the learned spatial environment model on a display or the like.
- the input / output data is, for example, the satisfaction level such as the measured value of each measuring device input to the learning device 1 or the amount of speech in the conversation of the generated input data 11, the stress level, the feature amount of the conversation content, and the spatial environment. It is the data of each item such as the feature amount of.
- the learning device 1 may display these data transitions in a graph or the like.
- the learning program and space control program include HDD (Hard Disk Drive), SSD (Solid State Drive), USB (Universal Serial Bus) memory, CD (Compact Disc), DVD (Digital Versatile Disc). ) Can be stored on a computer-readable recording medium such as), or can be distributed via a network.
- HDD Hard Disk Drive
- SSD Solid State Drive
- USB Universal Serial Bus
- CD Compact Disc
- DVD Digital Versatile Disc
- the learning device described in the embodiment of the present invention may be configured on one hardware as shown in FIG. 1, or may be configured on a plurality of hardware according to its function and the number of processes. You may. Further, the learning device according to the embodiment of the present invention may be integrally configured with home appliances such as speakers and lighting. Further, it may be realized on a computer that executes another processing program.
- the learning device may create a spatial environment model, and the spatial control device (not shown) may control the spatial environment by using the spatial environment model generated by the learning device.
- the space control device realizes each function of the space control device by executing a space control program by a general computer.
- the space control device according to the embodiment of the present invention may be configured on one hardware as in the above-mentioned learning device, or may be configured on a plurality of hardware according to its function and the number of processes. You may. Further, the space control device according to the embodiment of the present invention may be integrally configured with home appliances such as speakers and lighting.
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Physics & Mathematics (AREA)
- Entrepreneurship & Innovation (AREA)
- Human Resources & Organizations (AREA)
- Strategic Management (AREA)
- Tourism & Hospitality (AREA)
- General Physics & Mathematics (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Economics (AREA)
- Data Mining & Analysis (AREA)
- General Business, Economics & Management (AREA)
- Marketing (AREA)
- Theoretical Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- User Interface Of Digital Computer (AREA)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP2021524937A JPWO2020246600A1 (https=) | 2019-06-07 | 2020-06-05 |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP2019-107162 | 2019-06-07 | ||
| JP2019107162 | 2019-06-07 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2020246600A1 true WO2020246600A1 (ja) | 2020-12-10 |
Family
ID=73652562
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/JP2020/022388 Ceased WO2020246600A1 (ja) | 2019-06-07 | 2020-06-05 | 学習装置、空間制御装置、学習プログラムおよび空間制御プログラム |
Country Status (2)
| Country | Link |
|---|---|
| JP (1) | JPWO2020246600A1 (https=) |
| WO (1) | WO2020246600A1 (https=) |
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2022145165A1 (ja) * | 2020-12-28 | 2022-07-07 | パナソニックIpマネジメント株式会社 | 環境制御システム及び環境制御方法 |
| JP2024014612A (ja) * | 2022-07-22 | 2024-02-01 | トヨタ自動車株式会社 | 情報処理装置及び情報処理プログラム |
| JP2025049226A (ja) * | 2023-09-21 | 2025-04-03 | ソフトバンクグループ株式会社 | システム |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2006282115A (ja) * | 2005-04-04 | 2006-10-19 | Denso Corp | 自動車用ユーザーもてなしシステム |
| JP2008126818A (ja) * | 2006-11-20 | 2008-06-05 | Denso Corp | 自動車用ユーザーもてなしシステム |
| JP2009294790A (ja) * | 2008-06-03 | 2009-12-17 | Denso Corp | 自動車用情報提供システム |
| JP2018169506A (ja) * | 2017-03-30 | 2018-11-01 | トヨタ自動車株式会社 | 会話満足度推定装置、音声処理装置および会話満足度推定方法 |
| JP2019062490A (ja) * | 2017-09-28 | 2019-04-18 | 沖電気工業株式会社 | 制御装置、制御方法、プログラム及び制御システム |
-
2020
- 2020-06-05 WO PCT/JP2020/022388 patent/WO2020246600A1/ja not_active Ceased
- 2020-06-05 JP JP2021524937A patent/JPWO2020246600A1/ja active Pending
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2006282115A (ja) * | 2005-04-04 | 2006-10-19 | Denso Corp | 自動車用ユーザーもてなしシステム |
| JP2008126818A (ja) * | 2006-11-20 | 2008-06-05 | Denso Corp | 自動車用ユーザーもてなしシステム |
| JP2009294790A (ja) * | 2008-06-03 | 2009-12-17 | Denso Corp | 自動車用情報提供システム |
| JP2018169506A (ja) * | 2017-03-30 | 2018-11-01 | トヨタ自動車株式会社 | 会話満足度推定装置、音声処理装置および会話満足度推定方法 |
| JP2019062490A (ja) * | 2017-09-28 | 2019-04-18 | 沖電気工業株式会社 | 制御装置、制御方法、プログラム及び制御システム |
Non-Patent Citations (7)
| Title |
|---|
| FUJITA, KAZUYUKI ET AL.: "An Implementation and Evaluation of Room-Shaped System Using Ambient Suite for Communication Support in Party Situations", PROCEEDINGS OF THE 8TH INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTER ENTERTAINMENT TECHNOLOGY, vol. J96-D, no. 1, 1 January 2013 (2013-01-01), pages 120 - 132, XP055768708 * |
| ISO, YUKIKO ET AL.: "The effects of nodding on impression formation in while conversation members are speaking : The role of nonverbal behaviors in a triadic communication", TECHNICAL REPORT OF IEICE, vol. 103, no. 410, 31 October 2003 (2003-10-31), pages 31 - 36 * |
| ONDA, HIROKAZU: "Expanding target!, A new color appears in a favorable "Pekoppa"", TOYJOURNAL, 1 February 2009 (2009-02-01), pages 139 * |
| ONDA, HIROKAZU: "How about plants that read the air'?", TOYJOURNAL, 1 September 2008 (2008-09-01), pages 101 * |
| ONO, HIROSHI ET AL.: "An information and communication technology which works on our bodily feeling (2): effects of an ambient presentation of environmental information on bodily feeling", 25 September 2006 (2006-09-25), pages 165 - 170 * |
| TAKEMI, TSUZUKI ET AL.: "A Method for Sensing Synchrony between Communicating Persons by Sense Chair and the Evaluation toward Conversation", THE TRANSACTIONS OF HUMAN INTERFACE SOCIETY, vol. 19, no. 2, 2017, pages 151 - 162 * |
| TAKEMURA, HARUO: "Perspectives on Ambient Interface Technologies", JOURNAL OF JAPANESE, vol. 28, no. 2, 1 March 2013 (2013-03-01), pages 186 - 193 * |
Cited By (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2022145165A1 (ja) * | 2020-12-28 | 2022-07-07 | パナソニックIpマネジメント株式会社 | 環境制御システム及び環境制御方法 |
| JP2024014612A (ja) * | 2022-07-22 | 2024-02-01 | トヨタ自動車株式会社 | 情報処理装置及び情報処理プログラム |
| JP7694495B2 (ja) | 2022-07-22 | 2025-06-18 | トヨタ自動車株式会社 | 情報処理装置、情報処理方法、及び情報処理プログラム |
| JP2025049226A (ja) * | 2023-09-21 | 2025-04-03 | ソフトバンクグループ株式会社 | システム |
Also Published As
| Publication number | Publication date |
|---|---|
| JPWO2020246600A1 (https=) | 2020-12-10 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US9824606B2 (en) | Adaptive system for real-time behavioral coaching and command intermediation | |
| US10224060B2 (en) | Interactive home-appliance system, server device, interactive home appliance, method for allowing home-appliance system to interact, and nonvolatile computer-readable data recording medium encoded with program for allowing computer to implement the method | |
| US10311869B2 (en) | Method and system for automation of response selection and composition in dialog systems | |
| TW494308B (en) | Control method | |
| WO2020246600A1 (ja) | 学習装置、空間制御装置、学習プログラムおよび空間制御プログラム | |
| Bergsland et al. | Turning movement into music: Issues and applications of the MotionComposer, a therapeutic device for persons with different abilities | |
| JP6719739B2 (ja) | 対話方法、対話システム、対話装置、及びプログラム | |
| KR20200074680A (ko) | 단말 장치 및 이의 제어 방법 | |
| WO2018168427A1 (ja) | 学習装置、学習方法、音声合成装置、音声合成方法 | |
| JP7701762B2 (ja) | 会話ベースの精神障害選別方法及びその装置 | |
| CN110587621A (zh) | 机器人、基于机器人的病人看护方法和可读存储介质 | |
| JP2018087872A (ja) | 情報処理装置、情報処理システム、情報処理方法、及びプログラム | |
| Grassi et al. | Enhancing llm-based human-robot interaction with nuances for diversity awareness | |
| WO2019138652A1 (ja) | 情報処理装置、情報処理システム、および情報処理方法、並びにプログラム | |
| JPWO2020246600A5 (https=) | ||
| WO2016052520A1 (ja) | 対話装置 | |
| JP7664960B2 (ja) | 演出制御システム、演出制御方法及びプログラム | |
| WO2023286224A1 (ja) | 会話処理プログラム、会話処理システムおよび会話型ロボット | |
| Ivsic et al. | Transhuman Ansambl-Voice Beyond Language | |
| WO2022085214A1 (ja) | 会議支援装置、会議支援システム、および会議支援方法 | |
| CN121530779A (zh) | 基于智能家居设备的环境引导方法、装置及家庭服务器 | |
| Antaki et al. | Mobilizing others when you have little (recognizable) language | |
| JP6698428B2 (ja) | ネットワークシステム、情報処理方法、およびサーバ | |
| JP7790028B2 (ja) | 制御方法、制御装置、及びプログラム | |
| US20260061152A1 (en) | Sleep assistance through generation and evaluation of generative sleep content and/or sleep improvement interventions including training, adjusting, mediating, and/or integrating outputs of one or more ai models |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20817638 Country of ref document: EP Kind code of ref document: A1 |
|
| DPE1 | Request for preliminary examination filed after expiration of 19th month from priority date (pct application filed from 20040101) | ||
| ENP | Entry into the national phase |
Ref document number: 2021524937 Country of ref document: JP Kind code of ref document: A |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 20817638 Country of ref document: EP Kind code of ref document: A1 |