WO2022201876A1 - Control method, control device, and program - Google Patents

Control method, control device, and program Download PDF

Info

Publication number
WO2022201876A1
WO2022201876A1 PCT/JP2022/003789 JP2022003789W WO2022201876A1 WO 2022201876 A1 WO2022201876 A1 WO 2022201876A1 JP 2022003789 W JP2022003789 W JP 2022003789W WO 2022201876 A1 WO2022201876 A1 WO 2022201876A1
Authority
WO
WIPO (PCT)
Prior art keywords
sound
user
sensor
unit
speaker
Prior art date
Application number
PCT/JP2022/003789
Other languages
French (fr)
Japanese (ja)
Inventor
良太郎 青木
Original Assignee
ヤマハ株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ヤマハ株式会社 filed Critical ヤマハ株式会社
Publication of WO2022201876A1 publication Critical patent/WO2022201876A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • H04R3/12Circuits for transducers, loudspeakers or microphones for distributing signals to two or more loudspeakers

Definitions

  • the present invention relates to a control method, a control device, and a program.
  • This application claims priority based on Japanese Patent Application No. 2021-052696 filed on March 26, 2021, the entire disclosure of which is incorporated herein.
  • Patent Literature 1 discloses a technology related to an information processing apparatus capable of responding in accordance with the user's intention.
  • feedback from the home console is output as voice by the user interacting with the home console provided at a predetermined location. For example, in the case where the user utters the utterance "Please boil the bath”, if the bath does not finish boiling by the time the user comes home, the fact is fed back to the user. be. However, it is likely that the user will often point the home console at a location and then move from that location. In such a case, since the user is not near the home console, there is a problem that the feedback cannot be heard from the home console.
  • An example of an object of the present invention is to enable the user to select a speaker as a sound output destination so that the user can hear the sound notified to the user even when the user moves. It is to provide a control method, a control device, and a program.
  • one aspect of the present invention is performed by a computer, acquires a detection result detected by at least one sensor provided in a space, and based on the detection result, the space A user's position in the space is specified, a speaker installed near the specified position of the user is selected from a plurality of speakers installed in the space, and the selected speaker is caused to output sound.
  • a control method including
  • an acquisition unit acquires a detection result detected by at least one sensor provided in a space; a selection unit that selects a speaker installed near the specified position of the user from a plurality of speakers installed in the space; and an output unit that outputs sound to the selected speaker.
  • a control device comprising:
  • a computer obtains a detection result detected by at least one sensor provided in a space, and based on the detection result, a user's position in the space is specified.
  • selecting a speaker installed near the specified position of the user from a plurality of speakers installed in the space; causing the selected speaker to output sound; is a program that executes
  • FIG. 1 is a block diagram showing an example of the configuration of a speaker system 1 according to an embodiment
  • FIG. 2 is a block diagram showing an example of the configuration of a control device 20 in the embodiment
  • FIG. It is a figure which shows the example of the stationary sound information 220 in embodiment. It is a figure which shows the example of the notification sound information 221 in embodiment.
  • FIG. 1 is a diagram showing an overview of a speaker system 1 according to an embodiment.
  • the speaker system 1 is applied in a space where a user lives, such as a house H, for example.
  • house H is provided with a plurality of sensors 10-1 to 10-3, a control device 20, and a plurality of speakers 30-1 to 30-3.
  • Sensors 10-1 to 10-3 (and sensors 10-1 to 10-N to be described later) are referred to as sensor 10 when they are not distinguished from each other.
  • the speakers 30-1 to 30-3 (and the speakers 30-1 to 10-M to be described later) are referred to as speakers 30 when they are not distinguished from each other.
  • Each of the multiple sensors 10 is provided in a different space in the house H.
  • each of the plurality of sensors 10 is provided in the entrance of the house H, the kitchen, the living room, the bedroom, and the like.
  • each of the plurality of speakers 30 is provided in a different space in the house H. As shown in FIG. Both the sensor 10 and the speaker 30 may be provided in the same space. Only one of the sensor 10 and the speaker 30 may be provided in the space such as the entrance of the house H, the kitchen, the living room, the bedroom, or the like. Also, the number of sensors 10 connected to the control device 20 and the number of speakers 30 may be determined arbitrarily.
  • FIG. 2 is a block diagram showing an example of the configuration of the speaker system 1 according to the embodiment.
  • the speaker system 1 includes, for example, multiple sensors 10-1 to 10-N, a control device 20, and multiple speakers 30-1 to 30-M.
  • the plurality of sensors 10, the control device 20, and the plurality of speakers 30 are communicably connected via a communication network NW.
  • the communication network NW may be a wide area network, that is, a WAN (Wide Area Network), the Internet, or a combination thereof.
  • the communication network NW may be a communication network that is communicatively connected by a wired connection such as a cable or a wireless connection such as a wireless LAN.
  • the sensor 10 includes, for example, a sensor section and a communication section.
  • the sensor unit acquires information that can detect the position and movement of the user.
  • the communication unit transmits information acquired by the sensor unit to the control device 20 .
  • the sensor unit is a microphone (see, for example, FIG. 9).
  • the sensor section is a microphone
  • the sensor section collects sound propagating in the space in which the sensor section is provided. It is possible to detect the user's position and movement by analyzing the sound collected by the microphone. A method for detecting the user's position and movement based on the sound collected by the microphone will be described later in detail.
  • the sensor part is not limited to the microphone.
  • the sensor unit may be any sensor that can acquire information that can detect the position and movement of the user.
  • the sensor unit may be an image sensor, an infrared sensor, a temperature sensor, an optical sensor, or the like.
  • the sensor section captures an image of the space in which the sensor 10 is provided. By analyzing the image, it is possible to detect the position and movement of the user.
  • the sensor unit is an infrared sensor, a temperature sensor, an optical sensor, or the like
  • the sensor unit detects temperature detected by an infrared ray or a thermometer, or phase difference between irradiated light and reflected light. do. It is possible to detect the position and movement of the user based on changes in temperature, phase difference, and the like.
  • the speaker 30 is connected to the control device 20 and outputs sound based on the control of the control device 20 .
  • the speaker 30 may be any speaker device as long as it can output sound at least based on the control of the control device 20 .
  • the control device 20 is, for example, a computer device such as a PC (Personal Computer) or a server device.
  • the control device 20 receives information acquired by each of the sensors 10 .
  • the control device 20 identifies the position of the user based on the acquired information.
  • the control device 20 outputs sound from the speaker 30 provided near the position identified as the user's presence.
  • FIG. 3 is a block diagram showing an example of the configuration of the control device 20 in the embodiment.
  • the control device 20 includes a communication unit 21, a storage unit 22, and a control unit 23, for example.
  • the communication unit 21 is realized by, for example, a general-purpose communication IC (Integrated Circuit).
  • the communication unit 21 communicates with the sensor 10 and the speaker 30 via the communication network NW.
  • the storage unit 22 is realized by, for example, a storage device (a storage device with a non-transitory storage medium) such as a HDD (Hard Disk Drive) or flash memory, or a combination thereof.
  • the storage unit 22 stores a program for realizing each component (each function) of the control device 20, variables used when executing the program, and various kinds of information.
  • the storage unit 22 stores stationary sound information 220, notification sound information 221, installation information 222, and learned model information 223, for example. Details of the information stored in the storage unit 22 will be described later.
  • the control unit 23 is implemented by causing a CPU provided as hardware in the control device 20 to execute a program.
  • the control unit 23 includes an acquisition unit 230, an identification unit 231, a selection unit 232, an output unit 233, a device control unit 234, and a learning unit 235, for example.
  • the acquisition unit 230 acquires information acquired by the sensor 10 via the communication unit 21 and outputs the acquired information to the identification unit 231 .
  • the specifying unit 231 specifies the user's position based on the information acquired from the acquiring unit 230 . A method by which the identifying unit 231 identifies the position of the user will be described later in detail.
  • the specifying unit 231 outputs information indicating the specified position of the user to the selecting unit 232 .
  • the specifying unit 231 specifies the notification sound based on the information acquired from the acquiring unit 230 .
  • the notification sound is a sound to be notified to the user, and is, for example, a ringing sound of an interphone or the like.
  • the specifying unit 231 specifies, as the notification sound, a sound having frequency characteristics similar to the frequency characteristics of the sound stored in advance as the notification sound information 221 in the storage unit 22 .
  • the specifying unit 231 is an example of a “determination unit”.
  • the selection unit 232 selects the speaker 30 close to the user's position based on the information indicating the user's position acquired from the identification unit 231 .
  • the selection unit 232 refers to the installation information 222 based on information indicating the user's position, for example.
  • the installation information 222 is information indicating the position where the speaker 30 is installed.
  • the selection unit 232 compares the position of the user with the positions where the speakers 30 are installed, and selects the speaker 30 close to the position where the user is. For example, the selection unit 232 selects the speaker 30 closest to the user from the plurality of speakers 30 .
  • the selection unit 232 outputs information indicating the selected speaker 30 to the output unit 233 .
  • the output unit 233 Based on the information indicating the speaker 30 acquired from the selection unit 232, the output unit 233 causes the speaker 30 indicated in the information to output sound.
  • the sound that the output unit 233 causes the speaker 30 to output may be any sound as long as it is a sound that should be notified to the user.
  • the output unit 233 causes the speaker 30 to output the notification sound of an intercom or the like, music that the user is listening to, or the like.
  • the output unit 233 may set the speaker 30 that is not outputting sound among the plurality of speakers 30 to the standby mode.
  • the device control unit 234 comprehensively controls the control device 20 .
  • the device control unit 234 outputs information received by the communication unit 21 , that is, information acquired from the sensor 10 to the acquisition unit 230 .
  • the device control unit 234 also outputs information output by the output unit 233 , that is, information indicating the sound to be output by the speaker 30 to the communication unit 21 . As a result, sound is output from the speaker 30 .
  • the learning unit 235 generates a learned model by subjecting the machine learning model to machine learning.
  • the learning unit 235 generates a model for estimating the user's position (hereinafter referred to as a position estimation model).
  • the position estimation model is a model for estimating the user's position from sounds collected by a microphone as the sensor 10 . The method by which the learning unit 235 creates the position estimation model will be described later in detail.
  • the learning unit 235 also generates a model for estimating characteristic sounds (hereinafter referred to as a characteristic sound estimation model).
  • a feature sound estimation model is a model for estimating whether or not a sound includes a feature sound.
  • a characteristic sound here is a characteristic sound caused by the user's position or movement. The characteristic sound may be the characteristic sound of an unspecified user or the characteristic sound of a specific user.
  • a characteristic sound of an unspecified user is a sound that cannot identify an individual user when a plurality of users live in the house H, and is a characteristic sound that is caused by the positions and movements of the users. is.
  • the characteristic sound of an unspecified user is the sound of opening and closing the door of the living room, the startup sound or operation sound generated by operating a television or the like.
  • a characteristic sound of a specific user is a characteristic sound caused by the positions and movements of individual users when multiple users live in House H.
  • the characteristic sound of a specific user is the sound uttered by the specific user, footsteps of the specific user, and the like.
  • the feature sound estimation model outputs the result of estimating whether or not the sound input to the model is a feature sound that indicates the position or movement of a specific user.
  • the method by which the learning unit 235 creates the feature sound estimation model will be described later in detail.
  • the learning unit 235 also generates a model for estimating the notification sound (hereinafter referred to as the notification sound estimation model).
  • the notification sound estimation model is a model for estimating the sound to be notified to the user from the sound collected by the microphone as the sensor 10 .
  • the method by which the learning unit 235 creates the notification sound estimation model will be described later in detail.
  • FIG. 4 is a diagram showing an example of stationary sound information 220 in the embodiment.
  • the stationary sound information 220 is information about stationary sounds.
  • a steady sound is a sound that is constantly generated in a space regardless of whether or not the user is present in the space. For example, if the space is a kitchen and the ventilation fan provided in the kitchen is always rotating, the rotating sound becomes the stationary sound.
  • the steady sound information 220 includes, for example, items such as a steady sound identifier (shown as steady sound No in FIG. 4) and frequency characteristics.
  • a stationary sound identifier is identification information such as a number that uniquely identifies a stationary sound.
  • the frequency characteristic is information indicating the frequency characteristic of the stationary sound identified by the stationary sound identifier.
  • the frequency characteristic is, for example, information indicating the magnitude (gain) of the sound component included in the stationary sound for each frequency band.
  • FIG. 5 is a diagram showing an example of the notification sound information 221 in the embodiment.
  • the notification sound information 221 is information regarding notification sounds.
  • the notification sound is a sound to be notified to the user.
  • the notification sound is, for example, a ringing sound that sounds when the intercom is operated.
  • the notification sound information 221 includes, for example, items such as a notification sound identifier (indicated as notification sound No in FIG. 5) and frequency characteristics.
  • the notification sound identifier is identification information such as a number that uniquely identifies the notification sound.
  • the frequency characteristic is information indicating the frequency characteristic of the notification sound specified by the notification sound identifier.
  • the notification sound information 221 may be generated for each user. For example, when a parent and a child live in a house H, for example, the child's voice is the notification sound for the parent who is the user.
  • FIG. 6 and 7 are diagrams showing examples of the installation information 222 in the embodiment.
  • the installation information 222 is information indicating the installation positions of the sensor 10 and the speaker 30 installed in the house H, respectively.
  • FIG. 6 shows an example of installation information 222A indicating the installation position of the sensor 10.
  • FIG. 7 shows an example of installation information 222B indicating the installation position of the speaker 30.
  • FIG. 6 shows an example of installation information 222A indicating the installation position of the sensor 10.
  • FIG. 7 shows an example of installation information 222B indicating the installation position of the speaker 30.
  • the installation information 222A shown in FIG. 6 includes, for example, items such as a sensor identifier (indicated as sensor No in FIG. 6) and installation position.
  • the sensor identifier is identification information such as a number that uniquely identifies the sensor 10 .
  • the installation position is information indicating the installation position of the sensor 10 specified by the sensor identifier.
  • the installation information 222B shown in FIG. 7 includes, for example, items such as a speaker identifier (shown as speaker No in FIG. 7) and installation position.
  • the speaker identifier is identification information such as a number that uniquely identifies the speaker 30 .
  • the installation position is information indicating the installation position of the speaker 30 specified by the speaker identifier.
  • FIGS. 8 and 9 are diagrams for explaining the processing performed by the identifying unit 231 in the embodiment.
  • the specifying unit 231 includes, for example, a plurality of stationary sound reducing units 2310, 2311, 2312, a characteristic sound extracting unit 2313, and a position specifying unit 2314.
  • Information detected by each of the plurality of sensors 10 is input to each of the plurality of stationary sound reduction units 2310 , 2311 , and 2312 .
  • a stationary sound reduction unit 231N when the plurality of stationary sound reduction units 2310, 2311, and 2312 are not distinguished from each other, they are referred to as a stationary sound reduction unit 231N.
  • the steady sound reduction unit 231N reduces the steady sound from the sound detected by the sensor 10.
  • the stationary sound reduction unit 231N for example, periodically acquires the sound detected by the sensor 10, performs a Fourier transform on the acquired signal indicating changes in the sound in time series, and obtains the sensor sound indicating the frequency characteristics of the signal. Generate frequency response.
  • the stationary sound reduction unit 231N refers to the stationary sound information 220, acquires the frequency characteristics of the stationary sound, and subtracts the acquired frequency characteristics of the stationary sound from the sensor sound frequency characteristics.
  • the steady sound reduction unit 231N may filter the sound detected by the sensor 10 to reduce the steady sound.
  • the filter here is a filter having characteristics that reduce the frequency band corresponding to stationary sound.
  • the stationary sound information 220 includes items such as filter characteristics.
  • the filter characteristic indicates the characteristic of the filter that reduces the frequency component corresponding to the stationary sound. Filter characteristics are, for example, information indicating the filter configuration and coefficients when the filter is a digital filter.
  • the stationary sound reduction unit 231N refers to the stationary sound information 220, acquires filter characteristics for reducing stationary sounds, and generates a filter for reducing stationary sounds based on the acquired filter characteristics.
  • the steady sound reduction unit 231N reduces the steady sound from the sound detected by the sensor 10 by applying the generated filter to the sound detected by the sensor 10 .
  • the steady sound reduction unit 231N outputs the sound obtained by reducing the steady sound from the sound detected by the sensor 10 to the characteristic sound extraction unit 2313 and the position specifying unit 2314.
  • the characteristic sound extraction unit 2313 acquires the sound with the stationary sound reduced from the stationary sound reduction unit 231N.
  • the characteristic sound extraction unit 2313 determines whether or not the sound acquired from the stationary sound reduction unit 231N includes a characteristic sound, and outputs the determination result.
  • the characteristic sound extraction unit 2313 uses, for example, a characteristic sound estimation model to determine whether or not the sound acquired from the stationary sound reduction unit 231N includes a characteristic sound.
  • the characteristic sound extraction unit 2313 inputs the sound obtained from the stationary sound reduction unit 231N to the characteristic sound estimation model, thereby obtaining the result obtained from the characteristic sound estimation model.
  • the result obtained from the feature sound estimation model is the result of estimating whether or not the sound input to the model contains the feature sound.
  • the characteristic sound extraction unit 2313 outputs the result obtained from the characteristic sound estimation model to the position specifying unit 2314 .
  • the position specifying unit 2314 acquires the sound with the stationary sound reduced from the stationary sound reducing unit 231N. Also, the position specifying unit 2314 acquires information indicating whether or not the sound includes a characteristic sound from the characteristic sound extracting unit 2313 . If the sound includes a characteristic sound, the position specifying unit 2314 estimates the position and movement of the user based on the sound acquired from each of the steady sound reduction units 231N.
  • FIG. 9 schematically shows four microphones provided as sensors 10-1 to 10-4 at the four corners of the space where the house H is located.
  • FIG. 9 schematically shows that the user U# moves from the position of the user U# to the position of the user U in the center of the space.
  • the characteristic sound (the user's moving sound) acquired by the microphones of the sensors 10-1 and 10-2 gradually decreases.
  • the characteristic sound (the user's moving sound) acquired by the microphones of the sensors 10-3 and 10-4 gradually increases.
  • the position specifying unit 2314 detects the position and movement of the user based on such changes in the loudness of the characteristic sound.
  • Stationary sound is, for example, sound (such as noise) that is constantly occurring at a certain position.
  • a stationary sound is detected by averaging the frequency characteristics of sounds collected for a certain period of time (for example, one hour) by a microphone provided at that position.
  • Stationary sounds that are detected in this way include, for example, sounds that have the same frequency and are continuously output for a certain period of time, the roar of a ventilation fan, and natural sounds such as flowing water.
  • FIG. 10 shows that a steady sound is output from the sound source SA1, and the frequency characteristic of the steady sound is the characteristic N1.
  • each of the microphones installed as the sensor 10 in the house H may detect in advance the steady sound of the location where the microphone is installed.
  • Information in which the steady sound detected by each microphone is associated with the location where the sensor 10 is provided may be stored in the storage unit 22 as the steady sound information 220 .
  • the steady sound reduction unit 231N may reduce the steady sound corresponding to the location where the sensor 10 is installed from the sounds detected by the sensor 10 .
  • a characteristic sound is a sound having a characteristic different from that of a steady sound.
  • a stationary sound is a sound that has similar frequency components and is continuously output.
  • a characteristic sound is a sound caused by the presence of a user or the like, and is a sound that is suddenly output in a short period of time.
  • the characteristic sound is a sound corresponding to the human voice extracted based on the formant structure (the frequency characteristics of the voice of the uttering person).
  • a characteristic sound such as voice is output from user SA2
  • the frequency characteristic of the characteristic sound is characteristic N2.
  • a characteristic sound such as a cry is output from the animal SA3, and that the frequency characteristic of the characteristic sound is the characteristic N3.
  • the characteristic sound may include a sine wave that does not exist in nature, such as a touch panel operation sound. For example, if the touch panel operation sound indicates that there is a user operating the touch panel, the touch panel operation sound is the characteristic sound.
  • Information related to the characteristic sound may be stored in the storage unit 22.
  • the individual corresponding to the characteristic sound may be identified.
  • the harmonic structure (frequency characteristics) of each voice of a plurality of users is detected in advance. Multiple users, etc. may include family members, housemates, and animals such as pets.
  • the detected harmonic structure of each voice is stored in the storage unit 22 as the characteristic sound of each user.
  • the operation sound of the touch panel is stored in the storage unit 22 as the characteristic sound of the specific individual user.
  • the characteristic sound extraction unit 2313 determines whether or not the sound acquired from the stationary sound reduction unit 231N includes the characteristic sound based on the frequency characteristics of the characteristic sounds stored in the storage unit 22 .
  • FIG. 12 and 13 are diagrams for explaining the processing performed by the control device 20 of the embodiment.
  • microphones sensors 10-1 to 10-4 as sensors 10 are installed at four positions in a house H, respectively.
  • speakers 30 speakers 30-1 to 30-4 are installed near the respective microphones.
  • a steady sound is output from the sound source SA1.
  • the user SA2 exists near the sensor 10-1.
  • an animal SA3 exists in the vicinity of the sensor 10-3.
  • user SA4 exists near sensor 10-4.
  • the cry of the animal SA3 is detected as a notification sound and notified to the user SA2.
  • the stationary sound and the voice of the user SA2 are detected by the microphone of the sensor 10-1. Only stationary sounds are detected by the microphone of the sensor 10-2.
  • the microphone of the sensor 10-3 detects the stationary sound and the cry of the animal SA3.
  • the stationary sound and the voice of the user SA4 are detected by the microphone of the sensor 10-4.
  • the microphone of the sensor 10-1 detects a sound exhibiting a frequency characteristic T1 obtained by synthesizing the stationary sound and the voice of the user SA2.
  • the microphone of the sensor 10-2 detects the sound showing the frequency characteristic T2 of only the stationary sound.
  • the microphone of the sensor 10-3 detects a sound exhibiting a frequency characteristic T3 in which the stationary sound and the cry of the animal SA3 are combined.
  • the microphone of the sensor 10-4 detects a sound exhibiting a frequency characteristic T4 in which the stationary sound and the user SA4 are synthesized.
  • the steady sound reduction unit 231N reduces the steady sound from the sounds detected by the sensors 10-1 to 10-4.
  • the voice of user SA2 is extracted from the sound detected by the microphone of sensor 10-1.
  • Nothing is extracted from the sound detected by the microphone of sensor 10-2.
  • Only the cry of the animal SA3 is extracted from the sounds detected by the microphone of the sensor 10-3.
  • Only the voice of user SA4 is extracted from the sound detected by the microphone of sensor 10-4.
  • a characteristic sound is extracted from the sounds extracted from each of the sensors 10-1 to 10-4 by the characteristic sound extraction unit 2313.
  • the voice of the user SA2 is extracted as the characteristic sound from the sound detected by the microphone of the sensor 10-1.
  • No characteristic sound is extracted from the sound detected by the microphone of the sensor 10-2.
  • the bark of the animal SA3 is extracted as a characteristic sound from the sound detected by the microphone of the sensor 10-3.
  • the voice of the user SA4 is extracted as a characteristic sound from the sound detected by the microphone of the sensor 10-4.
  • the position specifying unit 2314 specifies that the user SA2 exists at the position of the sensor 10-1 (near the position of the sensor 10-1). Position specifying unit 2314 specifies that animal SA3 exists at the position of sensor 10-3 (near the position of sensor 10-3). Position specifying unit 2314 specifies that user SA4 exists at the position of sensor 10-4 (near the position of sensor 10-4).
  • the selection unit 232 selects the speaker 30-1 installed near the location where the user to be notified, that is, the user SA2 exists.
  • the output unit 233 causes the speaker 30-1 selected by the selection unit 232 to output the cry of the animal SA3 extracted from the microphone of the sensor 10-3.
  • the steady sound reduction unit 231N outputs the sound detected by the microphone as it is to the characteristic sound extraction unit 2313 without reducing the steady sound when the level of the steady sound is greater than a predetermined threshold.
  • the feature sound extraction unit 2313 extracts the feature sound from the sound detected by the microphone using the feature sound estimation model, for example, the feature sound is different depending on whether the stationary sound is reduced or not. Try to use estimation models. For example, when the stationary sound has been reduced, the feature sound extraction unit 2313 uses a model for estimating the feature sound from the sound from which the stationary sound has been reduced. On the other hand, when the stationary sound is not reduced, the characteristic sound extraction unit 2313 uses a model for estimating the characteristic sound from the sound for which the stationary sound is not reduced.
  • FIG. 14 is a flowchart explaining the flow of processing performed by the control device 20 of the embodiment.
  • the control device 20 acquires information (sensor information) acquired by the sensor 10 (step S1). When the sensor information includes sound, the control device 20 determines whether or not the sound is a notification sound (step S2). When the sensor 10 is a microphone, or when some of the plurality of sensors 10 are microphones, the control device 20 combines the sound collected by the microphone with the sound stored in the storage unit 22 as the notification sound information 221. It is determined whether or not the sound is a notification sound by a process such as comparing the frequency characteristics of the .
  • the control device 20 When determining that the sound is an information sound, the control device 20 extracts all sensor information used to identify the user's position (step S3). For example, when specifying the user's position using an image, the control device 20 extracts information of the image acquired by the image sensor. Alternatively, when the user's position is specified using the temperature, the control device 20 extracts information indicating the temperature acquired by the infrared sensor and the temperature sensor. When identifying the position of the user using sound, the control device 20 extracts information indicating the sound collected by the microphone. In the following flow, a case will be described in which the control device 20 identifies the position of the user using sound.
  • the control device 20 reduces stationary sounds from the sounds collected by the microphone (step S4).
  • the control device 20 extracts a characteristic sound from the sound from which the stationary sound has been reduced (step S5).
  • the control device 20 identifies the position of the user from the extracted characteristic sounds (step S6).
  • the control device 20 selects the speaker 30 for outputting the notification sound based on the identified position of the user (step S7).
  • the control device 20 outputs a notification sound from the selected speaker 30 (step S8).
  • step S2 if it is determined in step S2 that it is not the notification sound, the control device 20 ends the process.
  • FIG. 15 is a flowchart illustrating the flow of processing for generating a machine learning model according to the embodiment.
  • a location estimation model is a model that estimates a user's location.
  • the position estimation model estimates whether or not the sound collected by the microphone is the sound caused by the user will be described as an example.
  • the learning unit 235 acquires sensor information for machine learning, here information on sounds collected by a microphone (step S11).
  • the learning unit 235 generates a learning data set (step S12).
  • the learning data set here is information in which a label indicating whether or not the sound is caused by the user is added to the sound collected by the microphone.
  • the learning unit 235 determines whether or not all sensor information for machine learning has been acquired (step S13). If all the sensor information for machine learning has not been acquired, the learning unit 235 returns to step S1.
  • the learning unit 235 When all sensor information for machine learning is acquired, the learning unit 235 generates a position estimation model (learned model) (step S14).
  • the learning unit 235 generates a position estimation model (learned model) by having a machine learning model such as a CNN (Convolutional Neural Network) learn the learning data set.
  • a machine learning model such as CNN
  • the value output from the machine learning model is a label attached to the sound (attributed to the user).
  • the machine learning model is repeatedly trained on the training data set while adjusting the parameters of the machine learning model so as to approach the label indicating whether or not it is a sound that sounds.
  • the learning unit 235 stores the generated position estimation model (learned model) in the storage unit 22 as the learned model information 223 (step S15).
  • a feature sound estimation model is a model for estimating a feature sound.
  • the characteristic sound estimation model estimates whether or not the sound collected by the microphone is the characteristic sound will be described as an example.
  • the learning unit 235 acquires sensor information for machine learning, here information on sounds collected by a microphone (step S11).
  • the learning unit 235 generates a learning data set (step S12).
  • the learning data set here is information in which a label indicating whether or not the sound is a characteristic sound is attached to the sound collected by the microphone.
  • the learning unit 235 determines whether or not all sensor information for machine learning has been acquired (step S13). If all the sensor information for machine learning has not been acquired, the learning unit 235 returns to step S1.
  • the learning unit 235 When all sensor information for machine learning is acquired, the learning unit 235 generates a feature sound estimation model (learned model) (step S14).
  • the learning unit 235 generates a feature sound estimation model (learned model) by having a machine learning model such as CNN learn the learning data set.
  • the learning unit 235 inputs a sound collected by a microphone out of the learning data set to a machine learning model such as CNN, the value output from the machine learning model is a label attached to the sound (characteristic sound or not).
  • the machine learning model is trained on the iterative learning data set while adjusting the parameters of the machine learning model so as to approach the label indicating whether or not it is. This makes it possible to generate a model capable of accurately estimating whether a sound collected by a microphone is a characteristic sound or not.
  • the learning unit 235 stores the generated feature sound estimation model (learned model) in the storage unit 22 as the learned model information 223 (step S15).
  • the notification sound estimation model is a model for estimating the notification sound.
  • the notification sound estimation model estimates whether or not the sound collected by the microphone is the notification sound.
  • the learning unit 235 acquires sensor information for machine learning, here information on sounds collected by a microphone (step S11).
  • the learning unit 235 generates a learning data set (step S12).
  • the learning data set here is information in which a label indicating whether or not the sound is a notification sound is attached to the sound collected by the microphone.
  • the learning unit 235 determines whether or not all sensor information for machine learning has been acquired (step S13). If all the sensor information for machine learning has not been acquired, the learning unit 235 returns to step S1.
  • the learning unit 235 When all the sensor information for machine learning is acquired, the learning unit 235 generates a notification sound estimation model (learned model) (step S14).
  • the learning unit 235 generates a notification sound estimation model (learned model) by causing a machine learning model such as CNN to perform machine learning on the learning data set.
  • a machine learning model such as CNN
  • the value output from the machine learning model is a label attached to the sound (notification sound or not).
  • the machine learning model While adjusting the parameters of the machine learning model so as to approach the label indicating whether or not, the machine learning model is machine-learned on the iterative learning data set. This makes it possible to generate a model capable of accurately estimating whether or not the sound collected by the microphone is the notification sound.
  • the learning unit 235 stores the generated notification sound estimation model (learned model) in the storage unit 22 as the learned model information 223 (step S15).
  • the control device 20 controls sounds to be output from each of the plurality of speakers 30 provided in the space.
  • the control device 20 includes an acquisition unit 230 , an identification unit 231 , a selection unit 232 and an output unit 233 .
  • the acquisition unit 230 acquires sensor information (detection results) detected by the plurality of sensors 10 provided in the space.
  • the specifying unit 231 specifies the position of the user existing in the space based on the detection result acquired by the acquiring unit 230 .
  • the selection unit 232 selects the speaker 30 installed near the position of the user identified by the identification unit 231 from the plurality of speakers 30 .
  • the selection unit 232 may select the speaker 30 installed closest to the user's position from among the plurality of speakers 30 .
  • the output unit 233 outputs sound from the speaker selected by the selection unit 232 .
  • the specifying unit 231 specifies the user's position based on the output obtained by inputting the detection result into the position estimation model.
  • the position estimation model uses a learning data set in which a label indicating whether or not the detection result was detected due to the presence of the user is attached to the detection result, and the detection result and the presence of the user are used. It is a trained model that has learned the correspondence between whether the detection result is detected or not due to the fact that the detection result is detected.
  • the sensor 10 may be a microphone.
  • the specifying unit 231 specifies the user position based on the output obtained by inputting the detection result into the position estimation model.
  • the position estimation model is a model generated by performing learning using a learning data set in which a label indicating whether or not the sound for machine learning is caused by the presence of the user is attached. , is a model for estimating whether or not the sound detected by the microphone is caused by the presence of the user.
  • the senor 10 may be a microphone.
  • the identifying unit 231 detects the user's voice in the vicinity of the microphone. is present, and the location of the microphone is specified as the location of the user.
  • the sensor 10 may include a microphone.
  • the specifying unit 231 determines whether or not the sound is a notification sound to notify the user based on the characteristics of the sound collected by the microphone.
  • the output unit 233 causes the speaker 30 selected by the selection unit 232 to output the sound determined to be the notification sound by the identification unit 231 (an example of the determination unit).
  • the sensor 10 may include a microphone.
  • the identification unit 231 determines whether or not the sound collected by the microphone is the notification sound based on the output obtained by inputting the sound collected by the microphone into the notification sound estimation model.
  • the notification sound estimation model is a model generated by performing learning using a learning data set in which a machine learning sound is labeled to indicate whether or not the sound is a notification sound to be notified to the user. This is a model for estimating whether or not a sound detected by a microphone is a notification sound.
  • FIG. 16 is a flowchart for explaining the flow of processing for performing additional learning according to Modification 1 of the embodiment.
  • the control device 20 acquires sensor information for estimation, here information about sound collected by the microphone (step S21).
  • the controller 20 uses the notification sound estimation model (learned model) to estimate whether or not the sound collected by the microphone is the notification sound (step S22).
  • the control device 20 determines whether or not the estimation by the notification sound estimation model is correct (step S23).
  • the control device 20 determines whether or not the estimation by the notification sound estimation model is correct, for example, based on information input by the user operating a keyboard or the like.
  • the control device 20 If the estimation by the notification sound estimation model is incorrect, the control device 20 generates a learning data set for additional learning (step S24).
  • the learning data set for additional learning is information in which correct labels are attached to sounds that are incorrectly estimated by the notification sound estimation model, indicating whether or not the sounds are notification sounds.
  • the control device 20 determines whether or not to perform additional learning (step S25). For example, the control device 20 determines to perform additional learning when the number of learning data sets for additional learning generated in step S24 reaches a predetermined number. Alternatively, the control device 20 may determine to perform additional learning when the probability that the notification sound estimation model makes an erroneous estimation is greater than or equal to a predetermined value.
  • control device 20 When performing additional learning, the control device 20 performs additional learning by subjecting the learning data set for additional learning to machine learning, and updates the notification sound estimation model (learned model) (step S26). The control device 20 stores the updated notification sound estimation model (learned model) (step S27).
  • the acquisition unit 230 obtains the result of the user's determination as to whether or not the sound output by the output unit 233 is the notification sound. get.
  • a learning unit 235 performs additional learning for the notification sound estimation model based on the determination result acquired by the acquisition unit 230 .
  • additional learning can be performed on the notification sound estimation model, and if there is an error in the estimation, the error in the estimation can be corrected.
  • the added notification sound can be machine-learned by the notification sound estimation model.
  • control device 20 can also perform additional learning on the position estimation model and the feature sound estimation model using a similar method.
  • FIG. 17 shows an example in which a plurality of users U1 and U2 are present in a house H.
  • This modification differs from the above-described embodiment in that user U1 possesses a calling terminal T.
  • the transmission terminal T is a terminal device that transmits a signal indicating the presence of the user U1, and is, for example, a smart phone, a beacon terminal, or the like.
  • Acquisition unit 230 acquires a signal (position signal) transmitted from transmitting terminal T.
  • the specifying unit 231 specifies the position of the user U1 (originating terminal) based on the signal (position signal) acquired by the acquiring unit 230 .
  • a plurality of users U1 and U2 may exist in the space. At least one user U1 among the plurality of users U1 and U2 possesses a transmission terminal T that transmits a position signal indicating the position of user U1.
  • Acquisition unit 230 acquires a signal (position signal) transmitted from transmitting terminal T.
  • FIG. The specifying unit 231 specifies the position of the user U1 based on the signal (position signal) acquired by the acquiring unit 230 .
  • the position of the user U1 possessing the originating terminal T can be accurately specified based on the signal transmitted from the originating terminal T. can.
  • Modification 3 of Embodiment Here, Modification 3 of the embodiment will be described. In this modified example, it is assumed that the user listens to a sound that continues for a predetermined period of time, such as music. This modification differs from the above-described embodiment in that when the user moves in the space, sound is output following the movement.
  • the identification unit 231 determines the sound to be output following the movement of the user (speaker 30 to be output following the movement of the user) based on the characteristics of the sound collected by the microphone. to be changed). If the sound collected by the microphone is the sound to be output following the movement of the user, the identification unit 231 identifies the user near the microphone that collected the sound. For example, the identification unit 231 identifies the user near the microphone based on the characteristic sound collected together with the sound to be output following the movement of the user. The position of the user is repeatedly specified while the sound to be output following the movement of the user is acquired by the microphone.
  • the selection unit 232 selects the speaker 30 installed near the position identified this time.
  • the output unit 233 stops the sound output from the speaker 30 selected last time by the selection unit 232, and causes the speaker 30 selected this time to output the sound following the movement of the user.
  • the sensor 10 may include a microphone.
  • the specifying unit 231 determines whether or not the sound collected by the microphone is the sound to be output following the movement of the user.
  • the identifying unit 231 identifies a user near the microphone that has collected the sound determined to be the sound to be output following the movement of the user.
  • the specifying unit 231 associates the specified position of the user with the sound to be output following the movement of the user, and repeatedly specifies the position of the user.
  • the selection unit 232 selects the speaker 30 installed near the position of the user identified this time.
  • the output unit 233 stops the sound output from the speaker 30 selected last time by the selection unit 232, and outputs the sound to follow the movement of the user from the speaker 30 selected this time.
  • All or part of the speaker system 1 and the control device 20 in the above-described embodiment may be realized by a computer.
  • a program for realizing this function may be recorded in a computer-readable recording medium, and the program recorded in this recording medium may be read into a computer system and executed.
  • the “computer system” here includes hardware such as an OS and peripheral devices.
  • the term "computer-readable recording medium” refers to portable media such as flexible discs, magneto-optical discs, ROMs and CD-ROMs, and storage devices such as hard discs incorporated in computer systems.
  • “computer-readable recording medium” means a medium that dynamically retains a program for a short period of time, like a communication line when transmitting a program via a network such as the Internet or a communication line such as a telephone line. It may also include something that holds the program for a certain period of time, such as a volatile memory inside a computer system that serves as a server or client in that case. Further, the program may be for realizing a part of the functions described above, or may be capable of realizing the functions described above in combination with a program already recorded in the computer system. It may be implemented using a programmable logic device such as FPGA.
  • the present disclosure may be applied to control methods, control devices, and programs.

Abstract

This control method is implemented by a computer and comprises: acquiring a detection result detected by at least one sensor placed in a space; determining the position of a user in the space on the basis of the detection result; selecting, from among a plurality of speakers placed in the space, a speaker placed in the vicinity of the determined position of the user; and causing the selected speaker to output sounds.

Description

制御方法、制御装置、及びプログラムControl method, control device, and program
 本発明は、制御方法、制御装置、及びプログラムに関する。
 この出願は、2021年3月26日に出願された日本国特願2021-052696号を基礎とする優先権を主張し、その開示の全てをここに取り込む。
The present invention relates to a control method, a control device, and a program.
This application claims priority based on Japanese Patent Application No. 2021-052696 filed on March 26, 2021, the entire disclosure of which is incorporated herein.
 例えば、特許文献1には、ユーザの意図に即した応答を行うことができる情報処理装置に関する技術が開示されている。 For example, Patent Literature 1 discloses a technology related to an information processing apparatus capable of responding in accordance with the user's intention.
国際公開第2019/082630号WO2019/082630
 特許文献1に記載の技術では、所定の場所に設けられたホームコンソールとユーザが対話することによって、ホームコンソールからのフィードバックが音声として出力される。例えば、ユーザが、「お風呂沸かしておいて」という発話を行った場合において、ユーザの帰宅時刻までに、お風呂が沸かし終わらないような状態の場合には、その旨が、ユーザにフィードバックされる。しかしながら、ユーザがある位置においてホームコンソールに指示した後に、その位置から移動することがよくあると考えられる。このような場合、ユーザはホームコンソールの近傍にいないので、ホームコンソールからフィードバックを聞くことができなくなってしまうという問題があった。 With the technology described in Patent Document 1, feedback from the home console is output as voice by the user interacting with the home console provided at a predetermined location. For example, in the case where the user utters the utterance "Please boil the bath", if the bath does not finish boiling by the time the user comes home, the fact is fed back to the user. be. However, it is likely that the user will often point the home console at a location and then move from that location. In such a case, since the user is not near the home console, there is a problem that the feedback cannot be heard from the home console.
 本発明は、このような状況に鑑みてなされた。この発明の目的の一例は、ユーザが移動した場合であっても、ユーザに対して通知される音を、そのユーザが聞くことができるように音の出力先となるスピーカを選択することができる制御方法、制御装置、及びプログラムを提供することである。 The present invention was made in view of such circumstances. An example of an object of the present invention is to enable the user to select a speaker as a sound output destination so that the user can hear the sound notified to the user even when the user moves. It is to provide a control method, a control device, and a program.
 上述した課題を解決するために、本発明の一態様は、コンピュータによって行われ、空間内に設けられた少なくとも一つのセンサによって検知された検知結果を取得し、前記検知結果に基づいて、前記空間内におけるユーザの位置を特定し、前記空間内に設けられた複数のスピーカから、前記特定された前記ユーザの位置の近傍に設置されたスピーカを選択し、前記選択されたスピーカに音を出力させる、ことを含む制御方法である。 In order to solve the above-described problems, one aspect of the present invention is performed by a computer, acquires a detection result detected by at least one sensor provided in a space, and based on the detection result, the space A user's position in the space is specified, a speaker installed near the specified position of the user is selected from a plurality of speakers installed in the space, and the selected speaker is caused to output sound. , is a control method including
 また、本発明の一態様は、空間内に設けられた少なくとも一つのセンサによって検知された検知結果を取得する取得部と、前記検知結果に基づいて、前記空間内におけるユーザの位置を特定する特定部と、前記空間内に設けられた複数のスピーカから、前記特定された前記ユーザの位置の近傍に設置されたスピーカを選択する選択部と、前記選択されたスピーカに音を出力させる出力部と、を備える制御装置である。 In one aspect of the present invention, an acquisition unit acquires a detection result detected by at least one sensor provided in a space; a selection unit that selects a speaker installed near the specified position of the user from a plurality of speakers installed in the space; and an output unit that outputs sound to the selected speaker. is a control device comprising:
 また、本発明の一態様は、コンピュータに、空間内に設けられた少なくとも一つのセンサによって検知された検知結果を取得することと、前記検知結果に基づいて、前記空間内におけるユーザの位置を特定することと、前記空間内に設けられた複数のスピーカから、前記特定された前記ユーザの位置の近傍に設置されたスピーカを選択することと、前記選択されたスピーカに音を出力させることと、を実行させるプログラムである。 Further, according to one aspect of the present invention, a computer obtains a detection result detected by at least one sensor provided in a space, and based on the detection result, a user's position in the space is specified. selecting a speaker installed near the specified position of the user from a plurality of speakers installed in the space; causing the selected speaker to output sound; is a program that executes
 本発明の実施形態によれば、ユーザが移動した場合であっても、ユーザに対して通知される音を、そのユーザが聞くことができるように音の出力先となるスピーカを選択することができる。 According to the embodiment of the present invention, even when the user moves, it is possible to select a speaker as a sound output destination so that the user can hear the sound notified to the user. can.
実施形態におけるスピーカシステム1の概要を示す図である。It is a figure showing an outline of speaker system 1 in an embodiment. 実施形態におけるスピーカシステム1の構成の例を示すブロック図である。1 is a block diagram showing an example of the configuration of a speaker system 1 according to an embodiment; FIG. 実施形態における制御装置20の構成の例を示すブロック図である。2 is a block diagram showing an example of the configuration of a control device 20 in the embodiment; FIG. 実施形態における定常音情報220の例を示す図である。It is a figure which shows the example of the stationary sound information 220 in embodiment. 実施形態における報知音情報221の例を示す図である。It is a figure which shows the example of the notification sound information 221 in embodiment. 実施形態における設置情報222の例を示す図である。It is a figure which shows the example of the installation information 222 in embodiment. 実施形態における設置情報222の例を示す図である。It is a figure which shows the example of the installation information 222 in embodiment. 実施形態における特定部231が行う処理を説明する図である。It is a figure explaining the process which the specific|specification part 231 in embodiment performs. 実施形態における特定部231が行う処理を説明する図である。It is a figure explaining the process which the specific|specification part 231 in embodiment performs. 実施形態における定常音を説明する図である。It is a figure explaining the stationary sound in embodiment. 実施形態における特徴音を説明する図である。It is a figure explaining the characteristic sound in embodiment. 実施形態における制御装置20が行う処理を説明する図である。It is a figure explaining the process which the control apparatus 20 in embodiment performs. 実施形態における制御装置20が行う処理を説明する図である。It is a figure explaining the process which the control apparatus 20 in embodiment performs. 実施形態の制御装置20が行う処理の流れを説明するフローチャートである。4 is a flowchart for explaining the flow of processing performed by the control device 20 of the embodiment; 実施形態の学習モデルを生成する処理を説明する図である。It is a figure explaining the process which produces|generates the learning model of embodiment. 実施形態の変形例1に係る追加学習を説明する図である。It is a figure explaining the additional learning which concerns on the modification 1 of embodiment. 実施形態の変形例2について説明するための図である。It is a figure for demonstrating the modification 2 of embodiment.
 以下、本発明の実施形態を、図面を参照して説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.
 図1は、実施形態におけるスピーカシステム1の概要を示す図である。図1の例に示すように、スピーカシステム1は、例えば、住宅Hなど、ユーザが生活する空間において適用される。住宅Hにスピーカシステム1を適用する場合、住宅Hには、複数のセンサ10-1~10-3と、制御装置20と、複数のスピーカ30-1~30-3が設けられる。センサ10-1~10-3(及び後述するセンサ10-1~10-N)は、それぞれを区別しない場合には、センサ10と称する。スピーカ30-1~30-3(及び後述するスピーカ30-1~10-M)は、それぞれを区別しない場合には、スピーカ30と称する。複数のセンサ10のそれぞれは、住宅Hにおける異なる空間に設けられる。具体的には、複数のセンサ10のそれぞれは、住宅Hの玄関、台所、居間、寝室等に設けられる。また、複数のスピーカ30のそれぞれは、住宅Hにおける異なる空間に設けられる。センサ10とスピーカ30とが共に同じ空間に設けられていてもよい。センサ10とスピーカ30のいずれか一方のみが、住宅Hの玄関、台所、居間、寝室等の空間に設けられていてもよい。また、制御装置20と接続されるセンサ10の数、及びスピーカ30の数は任意に決定されてよい。 FIG. 1 is a diagram showing an overview of a speaker system 1 according to an embodiment. As shown in the example of FIG. 1, the speaker system 1 is applied in a space where a user lives, such as a house H, for example. When speaker system 1 is applied to house H, house H is provided with a plurality of sensors 10-1 to 10-3, a control device 20, and a plurality of speakers 30-1 to 30-3. Sensors 10-1 to 10-3 (and sensors 10-1 to 10-N to be described later) are referred to as sensor 10 when they are not distinguished from each other. The speakers 30-1 to 30-3 (and the speakers 30-1 to 10-M to be described later) are referred to as speakers 30 when they are not distinguished from each other. Each of the multiple sensors 10 is provided in a different space in the house H. FIG. Specifically, each of the plurality of sensors 10 is provided in the entrance of the house H, the kitchen, the living room, the bedroom, and the like. Moreover, each of the plurality of speakers 30 is provided in a different space in the house H. As shown in FIG. Both the sensor 10 and the speaker 30 may be provided in the same space. Only one of the sensor 10 and the speaker 30 may be provided in the space such as the entrance of the house H, the kitchen, the living room, the bedroom, or the like. Also, the number of sensors 10 connected to the control device 20 and the number of speakers 30 may be determined arbitrarily.
 図2は、実施形態におけるスピーカシステム1の構成の例を示すブロック図である。図2に示すように、スピーカシステム1は、例えば、複数のセンサ10-1~10-Nと、制御装置20と、複数のスピーカ30-1~30-Mとを備える。スピーカシステム1において、複数のセンサ10と、制御装置20と、複数のスピーカ30のそれぞれは、通信ネットワークNWを介して、通信可能に接続されている。 FIG. 2 is a block diagram showing an example of the configuration of the speaker system 1 according to the embodiment. As shown in FIG. 2, the speaker system 1 includes, for example, multiple sensors 10-1 to 10-N, a control device 20, and multiple speakers 30-1 to 30-M. In the speaker system 1, the plurality of sensors 10, the control device 20, and the plurality of speakers 30 are communicably connected via a communication network NW.
 通信ネットワークNWは、広域回線網、すなわちWAN(Wide Area Network)やインターネット、或いはこれらの組合せであってもよい。通信ネットワークNWは、ケーブル等による有線接続、或いは、無線LAN等による無線接続によって通信可能に接続される通信網であってもよい。 The communication network NW may be a wide area network, that is, a WAN (Wide Area Network), the Internet, or a combination thereof. The communication network NW may be a communication network that is communicatively connected by a wired connection such as a cable or a wireless connection such as a wireless LAN.
 センサ10は、例えば、センサ部と、通信部とを備える。センサ部は、ユーザの位置や動きを検知し得る情報を取得する。通信部は、センサ部が取得した情報を、制御装置20に送信する。 The sensor 10 includes, for example, a sensor section and a communication section. The sensor unit acquires information that can detect the position and movement of the user. The communication unit transmits information acquired by the sensor unit to the control device 20 .
 本実施形態では、センサ部は、マイクである(例えば、図9参照)。センサ部がマイクである場合、センサ部は、センサ部が設けられた空間を伝搬している音を集音する。マイクによって集音された音を解析することによってユーザの位置や動きを検知することが可能である。マイクによって集音された音に基づいてユーザの位置や動きを検知する方法については、後で詳しく説明する。 In this embodiment, the sensor unit is a microphone (see, for example, FIG. 9). When the sensor section is a microphone, the sensor section collects sound propagating in the space in which the sensor section is provided. It is possible to detect the user's position and movement by analyzing the sound collected by the microphone. A method for detecting the user's position and movement based on the sound collected by the microphone will be described later in detail.
 センサ部は、マイクに限定されることはない。センサ部は、ユーザの位置や動きを検知し得る情報を取得できるセンサであれば任意のセンサであってよい。例えば、センサ部は、イメージセンサ、赤外線センサ、温度センサ、光センサ等であってもよい。例えば、センサ部がイメージセンサである場合、センサ部は、センサ10が設けられた空間の画像を撮像する。画像を解析することによってユーザの位置や動きを検知することが可能である。センサ部が、赤外線センサ、温度センサ、光センサ等である場合、センサ部は、赤外線や温度計により検知された温度、或いは照射した光とその光が反射した反射光との位相差などを検知する。温度や位相差の変化等に基づいて、ユーザの位置や動きを検知することが可能である。 The sensor part is not limited to the microphone. The sensor unit may be any sensor that can acquire information that can detect the position and movement of the user. For example, the sensor unit may be an image sensor, an infrared sensor, a temperature sensor, an optical sensor, or the like. For example, when the sensor section is an image sensor, the sensor section captures an image of the space in which the sensor 10 is provided. By analyzing the image, it is possible to detect the position and movement of the user. When the sensor unit is an infrared sensor, a temperature sensor, an optical sensor, or the like, the sensor unit detects temperature detected by an infrared ray or a thermometer, or phase difference between irradiated light and reflected light. do. It is possible to detect the position and movement of the user based on changes in temperature, phase difference, and the like.
 スピーカ30は、制御装置20と接続され、制御装置20の制御に基づいて音を出力する。スピーカ30は、少なくとも制御装置20の制御に基づいて音を出力することができれば、任意のスピーカ装置であってよい。 The speaker 30 is connected to the control device 20 and outputs sound based on the control of the control device 20 . The speaker 30 may be any speaker device as long as it can output sound at least based on the control of the control device 20 .
 制御装置20は、例えばPC(Personal Computer)、サーバ装置などのコンピュータ装置である。制御装置20は、センサ10のそれぞれにより取得された情報を受信する。制御装置20は、取得した情報に基づいて、ユーザがいる位置を特定する。制御装置20は、ユーザがいると特定した位置の近くに設けられているスピーカ30から音を出力させる。 The control device 20 is, for example, a computer device such as a PC (Personal Computer) or a server device. The control device 20 receives information acquired by each of the sensors 10 . The control device 20 identifies the position of the user based on the acquired information. The control device 20 outputs sound from the speaker 30 provided near the position identified as the user's presence.
 図3は、実施形態における制御装置20の構成の例を示すブロック図である。制御装置20は、例えば、通信部21と、記憶部22と、制御部23とを備える。通信部21は、例えば、汎用の通信用IC(Integrated Circuit)によって実現される。通信部21は、通信ネットワークNWを介して、センサ10及びスピーカ30と通信を行う。 FIG. 3 is a block diagram showing an example of the configuration of the control device 20 in the embodiment. The control device 20 includes a communication unit 21, a storage unit 22, and a control unit 23, for example. The communication unit 21 is realized by, for example, a general-purpose communication IC (Integrated Circuit). The communication unit 21 communicates with the sensor 10 and the speaker 30 via the communication network NW.
 記憶部22は、例えば、HDD(Hard Disk Drive)やフラッシュメモリなどの記憶装置(非一過性の記憶媒体を備える記憶装置)、或いはこれらの組合せによって実現される。記憶部22は、制御装置20の各構成要素(各機能)を実現するためのプログラム、プログラムを実行する際に用いられる変数、及び各種の情報を記憶する。記憶部22は、例えば、定常音情報220と、報知音情報221と、設置情報222と、学習済モデル情報223とを記憶する。記憶部22が記憶するこれらの情報の詳しい内容については後述する。 The storage unit 22 is realized by, for example, a storage device (a storage device with a non-transitory storage medium) such as a HDD (Hard Disk Drive) or flash memory, or a combination thereof. The storage unit 22 stores a program for realizing each component (each function) of the control device 20, variables used when executing the program, and various kinds of information. The storage unit 22 stores stationary sound information 220, notification sound information 221, installation information 222, and learned model information 223, for example. Details of the information stored in the storage unit 22 will be described later.
 制御部23は、制御装置20がハードウェアとして備えるCPUにプログラムを実行させることによって実現される。制御部23は、例えば、取得部230と、特定部231と、選択部232と、出力部233と、装置制御部234と、学習部235とを備える。 The control unit 23 is implemented by causing a CPU provided as hardware in the control device 20 to execute a program. The control unit 23 includes an acquisition unit 230, an identification unit 231, a selection unit 232, an output unit 233, a device control unit 234, and a learning unit 235, for example.
 取得部230は、センサ10によって取得された情報を、通信部21を介して取得し、取得した情報を、特定部231に出力する。 The acquisition unit 230 acquires information acquired by the sensor 10 via the communication unit 21 and outputs the acquired information to the identification unit 231 .
 特定部231は、取得部230から取得した情報に基づいて、ユーザの位置を特定する。特定部231が、ユーザの位置を特定する方法については、後で詳しく説明する。特定部231は、特定したユーザの位置を示す情報を、選択部232に出力する。 The specifying unit 231 specifies the user's position based on the information acquired from the acquiring unit 230 . A method by which the identifying unit 231 identifies the position of the user will be described later in detail. The specifying unit 231 outputs information indicating the specified position of the user to the selecting unit 232 .
 また、特定部231は、取得部230から取得した情報に基づいて、報知音を特定する。報知音は、ユーザに報知すべき音であって、例えば、インターホンなどの呼出音等である。特定部231は、予め報知音情報221として記憶部22に記憶された音の周波数特性と近い周波数特性を有する音を、報知音と特定する。特定部231は「判定部」の一例である。 Also, the specifying unit 231 specifies the notification sound based on the information acquired from the acquiring unit 230 . The notification sound is a sound to be notified to the user, and is, for example, a ringing sound of an interphone or the like. The specifying unit 231 specifies, as the notification sound, a sound having frequency characteristics similar to the frequency characteristics of the sound stored in advance as the notification sound information 221 in the storage unit 22 . The specifying unit 231 is an example of a “determination unit”.
 選択部232は、特定部231から取得したユーザの位置を示す情報に基づいて、ユーザがいる位置に近いスピーカ30を選択する。選択部232は、例えば、ユーザの位置を示す情報に基づいて設置情報222を参照する。設置情報222は、スピーカ30が設置されている位置を示す情報である。選択部232は、ユーザの位置と、スピーカ30が設置されている位置のそれぞれとを比較し、ユーザがいる位置に近いスピーカ30を選択する。例えば、選択部232は、複数のスピーカ30のうち、ユーザと最も近いスピーカ30を選択する。選択部232は、選択したスピーカ30を示す情報を、出力部233に出力する。 The selection unit 232 selects the speaker 30 close to the user's position based on the information indicating the user's position acquired from the identification unit 231 . The selection unit 232 refers to the installation information 222 based on information indicating the user's position, for example. The installation information 222 is information indicating the position where the speaker 30 is installed. The selection unit 232 compares the position of the user with the positions where the speakers 30 are installed, and selects the speaker 30 close to the position where the user is. For example, the selection unit 232 selects the speaker 30 closest to the user from the plurality of speakers 30 . The selection unit 232 outputs information indicating the selected speaker 30 to the output unit 233 .
 出力部233は、選択部232から取得したスピーカ30を示す情報に基づいて、その情報に示されたスピーカ30に音を出力させる。ここで、出力部233がスピーカ30に出力させる音は、ユーザに報知すべき音であれば任意の音であってよい。例えば、出力部233は、インターホン等の報知音や、ユーザが聞いている音楽等を、スピーカ30に出力させる。 Based on the information indicating the speaker 30 acquired from the selection unit 232, the output unit 233 causes the speaker 30 indicated in the information to output sound. Here, the sound that the output unit 233 causes the speaker 30 to output may be any sound as long as it is a sound that should be notified to the user. For example, the output unit 233 causes the speaker 30 to output the notification sound of an intercom or the like, music that the user is listening to, or the like.
 また、出力部233は、複数のスピーカ30のうち、音を出力させていないスピーカ30をスタンバイモードに設定するようにしてもよい。 Also, the output unit 233 may set the speaker 30 that is not outputting sound among the plurality of speakers 30 to the standby mode.
 装置制御部234は、制御装置20を統括的に制御する。例えば、装置制御部234は、通信部21により受信された情報、すなわちセンサ10より取得された情報を、取得部230に出力する。また、装置制御部234は、出力部233により出力される情報、すなわちスピーカ30に出力させる音を示す情報を、通信部21に出力する。これにより、スピーカ30から音が出力される。 The device control unit 234 comprehensively controls the control device 20 . For example, the device control unit 234 outputs information received by the communication unit 21 , that is, information acquired from the sensor 10 to the acquisition unit 230 . The device control unit 234 also outputs information output by the output unit 233 , that is, information indicating the sound to be output by the speaker 30 to the communication unit 21 . As a result, sound is output from the speaker 30 .
 学習部235は、機械学習モデルを機械学習させることにより、学習済モデルを生成する。学習部235は、ユーザの位置を推定するモデル(以下、位置推定モデルという)を生成する。位置推定モデルは、センサ10としてのマイクにより集音された音からユーザの位置を推定するモデルである。学習部235が、位置推定モデルを作成する方法については、後で詳しく説明する。 The learning unit 235 generates a learned model by subjecting the machine learning model to machine learning. The learning unit 235 generates a model for estimating the user's position (hereinafter referred to as a position estimation model). The position estimation model is a model for estimating the user's position from sounds collected by a microphone as the sensor 10 . The method by which the learning unit 235 creates the position estimation model will be described later in detail.
 また、学習部235は、特徴音を推定するモデル(以下、特徴音推定モデルという)を生成する。特徴音推定モデルは、音に特徴音が含まれているか否かを推定するモデルである。ここでの特徴音は、ユーザの位置や動きに起因する特徴的な音である。特徴音は、不特定のユーザの特徴音であってもよいし、特定のユーザの特徴音であってもよい。 The learning unit 235 also generates a model for estimating characteristic sounds (hereinafter referred to as a characteristic sound estimation model). A feature sound estimation model is a model for estimating whether or not a sound includes a feature sound. A characteristic sound here is a characteristic sound caused by the user's position or movement. The characteristic sound may be the characteristic sound of an unspecified user or the characteristic sound of a specific user.
 不特定のユーザの特徴音とは、住宅Hに複数のユーザが居住している場合において、個々のユーザを特定することはできない音であって、ユーザの位置や動きに起因する特徴的な音である。例えば、不特定のユーザの特徴音は、居間のドアが開閉された音、テレビ等が操作されることにより発生する起動音や動作音などである。 A characteristic sound of an unspecified user is a sound that cannot identify an individual user when a plurality of users live in the house H, and is a characteristic sound that is caused by the positions and movements of the users. is. For example, the characteristic sound of an unspecified user is the sound of opening and closing the door of the living room, the startup sound or operation sound generated by operating a television or the like.
 特定のユーザの特徴音とは、住宅Hに複数のユーザが居住している場合において、個々のユーザの位置や動きに起因する特徴的な音である。例えば、特定のユーザの特徴音は、特定のユーザが発話した音、特定のユーザの足音などである。 A characteristic sound of a specific user is a characteristic sound caused by the positions and movements of individual users when multiple users live in House H. For example, the characteristic sound of a specific user is the sound uttered by the specific user, footsteps of the specific user, and the like.
 特徴音推定モデルは、モデルに入力された音について、その音が、特定のユーザの位置や動きを示す特徴音であるか否かを推定した推定結果を出力する。学習部235が、特徴音推定モデルを作成する方法については、後で詳しく説明する。 The feature sound estimation model outputs the result of estimating whether or not the sound input to the model is a feature sound that indicates the position or movement of a specific user. The method by which the learning unit 235 creates the feature sound estimation model will be described later in detail.
 また、学習部235は、報知音を推定するモデル(以下、報知音推定モデルという)を生成する。報知音推定モデルは、センサ10としてのマイクにより集音された音からユーザに報知すべき音を推定するモデルである。学習部235が、報知音推定モデルを作成する方法については、後で詳しく説明する。 The learning unit 235 also generates a model for estimating the notification sound (hereinafter referred to as the notification sound estimation model). The notification sound estimation model is a model for estimating the sound to be notified to the user from the sound collected by the microphone as the sensor 10 . The method by which the learning unit 235 creates the notification sound estimation model will be described later in detail.
 図4は、実施形態における定常音情報220の例を示す図である。定常音情報220は、定常音に関する情報である。定常音は、空間にユーザが存在しているか否かに関わらず、その空間において定常的に発生している音である。例えば、空間が台所であり、台所に設けられた換気扇が常時回転している場合、その回転音が定常音となる。定常音情報220は、例えば、定常音識別子(図4において定常音Noと示す)と、周波数特性、などの項目を備える。定常音識別子は、定常音を一意に特定する番号などの識別情報である。周波数特性は、その定常音識別子により特定される定常音の周波数特性を示す情報である。周波数特性は、例えば、定常音に含まれる音成分の周波数帯域ごとの大きさ(ゲイン)を示す情報である。 FIG. 4 is a diagram showing an example of stationary sound information 220 in the embodiment. The stationary sound information 220 is information about stationary sounds. A steady sound is a sound that is constantly generated in a space regardless of whether or not the user is present in the space. For example, if the space is a kitchen and the ventilation fan provided in the kitchen is always rotating, the rotating sound becomes the stationary sound. The steady sound information 220 includes, for example, items such as a steady sound identifier (shown as steady sound No in FIG. 4) and frequency characteristics. A stationary sound identifier is identification information such as a number that uniquely identifies a stationary sound. The frequency characteristic is information indicating the frequency characteristic of the stationary sound identified by the stationary sound identifier. The frequency characteristic is, for example, information indicating the magnitude (gain) of the sound component included in the stationary sound for each frequency band.
 図5は、実施形態における報知音情報221の例を示す図である。報知音情報221は、報知音に関する情報である。報知音は、ユーザに報知すべき音である。報知音は、例えば、インターホンが操作された場合に鳴る呼出音である。報知音情報221は、例えば、報知音識別子(図5において報知音Noと示す)と、周波数特性、などの項目を備える。報知音識別子は、報知音を一意に特定する番号などの識別情報である。周波数特性は、その報知音識別子により特定される報知音の周波数特性を示す情報である。 FIG. 5 is a diagram showing an example of the notification sound information 221 in the embodiment. The notification sound information 221 is information regarding notification sounds. The notification sound is a sound to be notified to the user. The notification sound is, for example, a ringing sound that sounds when the intercom is operated. The notification sound information 221 includes, for example, items such as a notification sound identifier (indicated as notification sound No in FIG. 5) and frequency characteristics. The notification sound identifier is identification information such as a number that uniquely identifies the notification sound. The frequency characteristic is information indicating the frequency characteristic of the notification sound specified by the notification sound identifier.
 また、報知音情報221は、ユーザごとに生成されてもよい。例えば、住宅Hに、親と子が居住している場合、例えば、ユーザである親にとって子供の声が報知音となる。 Also, the notification sound information 221 may be generated for each user. For example, when a parent and a child live in a house H, for example, the child's voice is the notification sound for the parent who is the user.
 図6及び図7は、実施形態における設置情報222の例を示す図である。設置情報222は、住宅Hに設置されるセンサ10及びスピーカ30のそれぞれの設置位置を示す情報である。図6には、センサ10の設置位置を示す設置情報222Aの例が示されている。図7には、スピーカ30の設置位置を示す設置情報222Bの例が示されている。 6 and 7 are diagrams showing examples of the installation information 222 in the embodiment. The installation information 222 is information indicating the installation positions of the sensor 10 and the speaker 30 installed in the house H, respectively. FIG. 6 shows an example of installation information 222A indicating the installation position of the sensor 10. As shown in FIG. FIG. 7 shows an example of installation information 222B indicating the installation position of the speaker 30. As shown in FIG.
 図6に示す設置情報222Aは、例えば、センサ識別子(図6においてセンサNoと示す)と、設置位置、などの項目を備える。センサ識別子は、センサ10を一意に特定する番号などの識別情報である。設置位置は、そのセンサ識別子により特定されるセンサ10の設置位置を示す情報である。 The installation information 222A shown in FIG. 6 includes, for example, items such as a sensor identifier (indicated as sensor No in FIG. 6) and installation position. The sensor identifier is identification information such as a number that uniquely identifies the sensor 10 . The installation position is information indicating the installation position of the sensor 10 specified by the sensor identifier.
 図7に示す設置情報222Bは、例えば、スピーカ識別子(図7においてスピーカNoと示す)と、設置位置、などの項目を備える。スピーカ識別子は、スピーカ30を一意に特定する番号などの識別情報である。設置位置は、そのスピーカ識別子により特定されるスピーカ30の設置位置を示す情報である。 The installation information 222B shown in FIG. 7 includes, for example, items such as a speaker identifier (shown as speaker No in FIG. 7) and installation position. The speaker identifier is identification information such as a number that uniquely identifies the speaker 30 . The installation position is information indicating the installation position of the speaker 30 specified by the speaker identifier.
(ユーザの位置を特定する方法)
 ここで、特定部231がユーザの位置を特定する方法について、図8及び図9を参照して説明する。図8及び図9は、実施形態における特定部231が行う処理を説明する図である。
(Method for locating the user)
Here, a method for specifying the position of the user by the specifying unit 231 will be described with reference to FIGS. 8 and 9. FIG. 8 and 9 are diagrams for explaining the processing performed by the identifying unit 231 in the embodiment.
 図8に示すように、特定部231は、例えば、複数の定常音削減部2310、2311、2312と、特徴音抽出部2313と、位置特定部2314とを備える。 As shown in FIG. 8, the specifying unit 231 includes, for example, a plurality of stationary sound reducing units 2310, 2311, 2312, a characteristic sound extracting unit 2313, and a position specifying unit 2314.
 複数の定常音削減部2310、2311、2312のそれぞれには、複数のセンサ10のそれぞれにより検出された情報が入力される。以下の説明では、複数の定常音削減部2310、2311、2312のそれぞれを区別しない場合には、定常音削減部231Nと表記する。 Information detected by each of the plurality of sensors 10 is input to each of the plurality of stationary sound reduction units 2310 , 2311 , and 2312 . In the following description, when the plurality of stationary sound reduction units 2310, 2311, and 2312 are not distinguished from each other, they are referred to as a stationary sound reduction unit 231N.
 定常音削減部231Nは、センサ10により検出された音から定常音を削減する。定常音削減部231Nは、例えば、センサ10により検出された音を定期的に取得し、取得した時系列の音の変化を示す信号をフーリエ変換することにより、その信号の周波数特性を示すセンサ音周波数特性を生成する。定常音削減部231Nは、定常音情報220を参照して、定常音の周波数特性を取得し、取得した定常音の周波数特性を、センサ音周波数特性から減算する。 The steady sound reduction unit 231N reduces the steady sound from the sound detected by the sensor 10. The stationary sound reduction unit 231N, for example, periodically acquires the sound detected by the sensor 10, performs a Fourier transform on the acquired signal indicating changes in the sound in time series, and obtains the sensor sound indicating the frequency characteristics of the signal. Generate frequency response. The stationary sound reduction unit 231N refers to the stationary sound information 220, acquires the frequency characteristics of the stationary sound, and subtracts the acquired frequency characteristics of the stationary sound from the sensor sound frequency characteristics.
 或いは、定常音削減部231Nは、センサ10により検出された音に、フィルタをかけることにより、定常音を低減するようにしてもよい。ここでのフィルタは、定常音に相当する周波数帯域を低減させるような特性をもつフィルタである。この場合、例えば、定常音情報220は、フィルタ特性などの項目を備える。フィルタ特性には、定常音に相当する周波数成分を低減させるフィルタの特性が示される。フィルタ特性は、例えば、フィルタがデジタルフィルタである場合にはそのフィルタ構成と係数などを示す情報である。定常音削減部231Nは、定常音情報220を参照して、定常音を削減するフィルタ特性を取得し、取得したフィルタ特性に基づいて定常音を削減するフィルタを生成する。定常音削減部231Nは、センサ10により検出された音に、生成したフィルタをかけることにより、センサ10により検出された音から定常音を削減する。 Alternatively, the steady sound reduction unit 231N may filter the sound detected by the sensor 10 to reduce the steady sound. The filter here is a filter having characteristics that reduce the frequency band corresponding to stationary sound. In this case, for example, the stationary sound information 220 includes items such as filter characteristics. The filter characteristic indicates the characteristic of the filter that reduces the frequency component corresponding to the stationary sound. Filter characteristics are, for example, information indicating the filter configuration and coefficients when the filter is a digital filter. The stationary sound reduction unit 231N refers to the stationary sound information 220, acquires filter characteristics for reducing stationary sounds, and generates a filter for reducing stationary sounds based on the acquired filter characteristics. The steady sound reduction unit 231N reduces the steady sound from the sound detected by the sensor 10 by applying the generated filter to the sound detected by the sensor 10 .
 定常音削減部231Nは、センサ10により検出された音から定常音を削減することにより得られた音を、特徴音抽出部2313、及び位置特定部2314に出力する。 The steady sound reduction unit 231N outputs the sound obtained by reducing the steady sound from the sound detected by the sensor 10 to the characteristic sound extraction unit 2313 and the position specifying unit 2314.
 特徴音抽出部2313は、定常音削減部231Nから定常音を削減した音を取得する。特徴音抽出部2313は、定常音削減部231Nから取得した音に特徴音が含まれているか否かを判定し、その判定結果を出力する。特徴音抽出部2313は、例えば、特徴音推定モデルを用いて、定常音削減部231Nから取得した音に特徴音が含まれているか否かを判定する。 The characteristic sound extraction unit 2313 acquires the sound with the stationary sound reduced from the stationary sound reduction unit 231N. The characteristic sound extraction unit 2313 determines whether or not the sound acquired from the stationary sound reduction unit 231N includes a characteristic sound, and outputs the determination result. The characteristic sound extraction unit 2313 uses, for example, a characteristic sound estimation model to determine whether or not the sound acquired from the stationary sound reduction unit 231N includes a characteristic sound.
 特徴音抽出部2313は、定常音削減部231Nから取得した音を、特徴音推定モデルに入力することにより、その特徴音推定モデルから得られた結果を取得する。特徴音推定モデルから得られる結果は、モデルに入力された音に特徴音が含まれているか否かを推定した推定結果である。特徴音抽出部2313は、特徴音推定モデルから得られた結果を、位置特定部2314に出力する。 The characteristic sound extraction unit 2313 inputs the sound obtained from the stationary sound reduction unit 231N to the characteristic sound estimation model, thereby obtaining the result obtained from the characteristic sound estimation model. The result obtained from the feature sound estimation model is the result of estimating whether or not the sound input to the model contains the feature sound. The characteristic sound extraction unit 2313 outputs the result obtained from the characteristic sound estimation model to the position specifying unit 2314 .
 位置特定部2314は、定常音削減部231Nから定常音を削減した音を取得する。また、位置特定部2314は、特徴音抽出部2313から、音に特徴音が含まれている否かを示す情報を取得する。位置特定部2314は、音に特徴音が含まれている場合、定常音削減部231Nのそれぞれから取得した音に基づいて、ユーザの位置や動きを推定する。 The position specifying unit 2314 acquires the sound with the stationary sound reduced from the stationary sound reducing unit 231N. Also, the position specifying unit 2314 acquires information indicating whether or not the sound includes a characteristic sound from the characteristic sound extracting unit 2313 . If the sound includes a characteristic sound, the position specifying unit 2314 estimates the position and movement of the user based on the sound acquired from each of the steady sound reduction units 231N.
 位置特定部2314がユーザの位置や動きを推定する処理について、図9を参照して説明する。図9には、住宅Hのある空間の四隅に、センサ10-1~10-4として四つのマイクが設けられている様子が模式的に示されている。また、図9には、空間の中央に、ユーザU#の位置から、ユーザUの位置に移動している様子が模式的に示されている。 The process of estimating the user's position and movement by the position specifying unit 2314 will be described with reference to FIG. FIG. 9 schematically shows four microphones provided as sensors 10-1 to 10-4 at the four corners of the space where the house H is located. In addition, FIG. 9 schematically shows that the user U# moves from the position of the user U# to the position of the user U in the center of the space.
 この図の例において、ユーザU#の位置から、ユーザUの位置に移動した場合、センサ10-1、10-2のそれぞれのマイクが取得する特徴音(ユーザの移動音)は、徐々に小さくなる。一方、センサ10-3、10-4のそれぞれのマイクが取得する特徴音(ユーザの移動音)は、徐々に大きくなる。位置特定部2314は、このような、特徴音の大きさの変化に基づいて、ユーザの位置や動きを検知する。 In the example of this figure, when the user U# moves from the position of the user U# to the position of the user U, the characteristic sound (the user's moving sound) acquired by the microphones of the sensors 10-1 and 10-2 gradually decreases. Become. On the other hand, the characteristic sound (the user's moving sound) acquired by the microphones of the sensors 10-3 and 10-4 gradually increases. The position specifying unit 2314 detects the position and movement of the user based on such changes in the loudness of the characteristic sound.
 ここで、定常音について、図10を参照して説明する。定常音は、例えば、ある位置で定常的に発生している音(騒音など)である。定常音は、その位置に設けられたマイクにより、一定の時間(例えば、1時間など)集音された音の周波数特性を平均化することにより検出される。このように検出される定常音としては、例えば、同じ周波数をもつ音が継続して一定時間出力するような音、換気扇のうなり音や、水が流れたりする自然音などがある。図10の例では、音源SA1から定常音が出力され、その定常音の周波数特性が特性N1であることが示されている。 Here, the stationary sound will be explained with reference to FIG. Stationary sound is, for example, sound (such as noise) that is constantly occurring at a certain position. A stationary sound is detected by averaging the frequency characteristics of sounds collected for a certain period of time (for example, one hour) by a microphone provided at that position. Stationary sounds that are detected in this way include, for example, sounds that have the same frequency and are continuously output for a certain period of time, the roar of a ventilation fan, and natural sounds such as flowing water. The example of FIG. 10 shows that a steady sound is output from the sound source SA1, and the frequency characteristic of the steady sound is the characteristic N1.
 例えば、住宅Hにセンサ10として設けられたマイクのそれぞれが、そのマイクが設けられている場所の定常音を予め検出するようにしてもよい。マイクのそれぞれにより検出された定常音を、そのセンサ10が設けられた場所に対応づけた情報を、定常音情報220として記憶部22に記憶させるようにしてもよい。この場合、定常音削減部231Nは、センサ10により検出された音から、そのセンサ10が設けられている場所に対応する定常音を削減するようにしてもよい。 For example, each of the microphones installed as the sensor 10 in the house H may detect in advance the steady sound of the location where the microphone is installed. Information in which the steady sound detected by each microphone is associated with the location where the sensor 10 is provided may be stored in the storage unit 22 as the steady sound information 220 . In this case, the steady sound reduction unit 231N may reduce the steady sound corresponding to the location where the sensor 10 is installed from the sounds detected by the sensor 10 .
 ここで、特徴音について、図11を参照して説明する。特徴音は、定常音とは違う特徴を持つ音である。具体的には、定常音は、同じような周波数成分を有し、継続して出力される音である。これに対し、特徴音は、ユーザ等の存在に起因する音であり、短時間に突発的に出力される音である。例えば、特徴音は、フォルマント構造(言葉を発している人の音声の周波数特性)に基づいて抽出した人の声に相当する音である。図11の例では、ユーザSA2から声などの特徴音が出力され、その特徴音の周波数特性が特性N2であることが示されている。また、動物SA3から鳴き声などの特徴音が出力され、その特徴音の周波数特性が特性N3であることが示されている。 Here, the characteristic sound will be explained with reference to FIG. A characteristic sound is a sound having a characteristic different from that of a steady sound. Specifically, a stationary sound is a sound that has similar frequency components and is continuously output. On the other hand, a characteristic sound is a sound caused by the presence of a user or the like, and is a sound that is suddenly output in a short period of time. For example, the characteristic sound is a sound corresponding to the human voice extracted based on the formant structure (the frequency characteristics of the voice of the uttering person). In the example of FIG. 11, it is shown that a characteristic sound such as voice is output from user SA2, and the frequency characteristic of the characteristic sound is characteristic N2. Further, it is indicated that a characteristic sound such as a cry is output from the animal SA3, and that the frequency characteristic of the characteristic sound is the characteristic N3.
 また、特徴音として、自然界には存在しないサイン波、例えば、タッチパネルの操作音、などが含まれていてもよい。例えば、タッチパネルの操作音が、そのタッチパネルを操作するユーザが存在することを示す場合、タッチパネルの操作音は特徴音である。 Also, the characteristic sound may include a sine wave that does not exist in nature, such as a touch panel operation sound. For example, if the touch panel operation sound indicates that there is a user operating the touch panel, the touch panel operation sound is the characteristic sound.
 特徴音に係る情報を記憶部22に記憶するようにしてもよい。この場合において、特徴音に対応する個人を特定できるようにしてもよい。具体的には、複数のユーザのそれぞれの声の調波構造(周波数特性)を予め検出する。複数のユーザ等には、家族、同居人、及び飼っているネコなどの動物が含まれていてよい。検出したそれぞれの声の調波構造が、それぞれのユーザの特徴音として、記憶部22に記憶される。また、タッチパネル等が特定の個人ユーザによってのみ操作される場合には、そのタッチパネルの操作音が、特定の個人ユーザの特徴音として、記憶部22に記憶される。この場合、特徴音抽出部2313は、記憶部22に記憶された特徴音の周波数特性に基づいて、定常音削減部231Nから取得した音に、特徴音が含まれているか否かを判定する。 Information related to the characteristic sound may be stored in the storage unit 22. In this case, the individual corresponding to the characteristic sound may be identified. Specifically, the harmonic structure (frequency characteristics) of each voice of a plurality of users is detected in advance. Multiple users, etc. may include family members, housemates, and animals such as pets. The detected harmonic structure of each voice is stored in the storage unit 22 as the characteristic sound of each user. Further, when a touch panel or the like is operated only by a specific individual user, the operation sound of the touch panel is stored in the storage unit 22 as the characteristic sound of the specific individual user. In this case, the characteristic sound extraction unit 2313 determines whether or not the sound acquired from the stationary sound reduction unit 231N includes the characteristic sound based on the frequency characteristics of the characteristic sounds stored in the storage unit 22 .
 ここで、本実施形態の作用について、図12、及び図13を参照して説明する。図12、及び図13は、実施形態の制御装置20が行う処理を説明する図である。図12に示すように、住宅Hにおける四か所の位置のそれぞれに、センサ10としてのマイク(センサ10-1~10-4)が設置されているものとする。また、それぞれのマイクの近くには、スピーカ30(スピーカ30-1~30-4)がそれぞれ設置されているものとする。この空間では、音源SA1から定常音が出力されている。また、センサ10-1の付近にはユーザSA2が存在しているものとする。センサ10-3の付近には動物SA3が存在しているものとする。センサ10-4の付近にはユーザSA4が存在しているものとする。 Here, the action of this embodiment will be described with reference to FIGS. 12 and 13. FIG. 12 and 13 are diagrams for explaining the processing performed by the control device 20 of the embodiment. As shown in FIG. 12, it is assumed that microphones (sensors 10-1 to 10-4) as sensors 10 are installed at four positions in a house H, respectively. It is also assumed that speakers 30 (speakers 30-1 to 30-4) are installed near the respective microphones. In this space, a steady sound is output from the sound source SA1. It is also assumed that the user SA2 exists near the sensor 10-1. It is assumed that an animal SA3 exists in the vicinity of the sensor 10-3. It is assumed that user SA4 exists near sensor 10-4.
 ここで、図13の符号Jの矢印で示すように、動物SA3の鳴き声を報知音として検知し、ユーザSA2に通知を行う場合について説明する。センサ10-1のマイクにより、定常音とユーザSA2の音声が検出される。センサ10-2のマイクにより、定常音のみが検出される。センサ10-3のマイクにより、定常音と動物SA3の鳴き声が検出される。センサ10-4のマイクにより、定常音とユーザSA4の音声が検出される。 Here, as indicated by the arrow J in FIG. 13, a case will be described in which the cry of the animal SA3 is detected as a notification sound and notified to the user SA2. The stationary sound and the voice of the user SA2 are detected by the microphone of the sensor 10-1. Only stationary sounds are detected by the microphone of the sensor 10-2. The microphone of the sensor 10-3 detects the stationary sound and the cry of the animal SA3. The stationary sound and the voice of the user SA4 are detected by the microphone of the sensor 10-4.
 具体的には、図13に示すように、センサ10-1のマイクにより、定常音とユーザSA2の音声とが合成された周波数特性T1を示す音が検出される。センサ10-2のマイクにより、定常音のみの周波数特性T2を示す音が検出される。センサ10-3のマイクにより、定常音と動物SA3の鳴き声とが合成された周波数特性T3を示す音が検出される。センサ10-4のマイクにより、定常音とユーザSA4とが合成された周波数特性T4を示す音が検出される。 Specifically, as shown in FIG. 13, the microphone of the sensor 10-1 detects a sound exhibiting a frequency characteristic T1 obtained by synthesizing the stationary sound and the voice of the user SA2. The microphone of the sensor 10-2 detects the sound showing the frequency characteristic T2 of only the stationary sound. The microphone of the sensor 10-3 detects a sound exhibiting a frequency characteristic T3 in which the stationary sound and the cry of the animal SA3 are combined. The microphone of the sensor 10-4 detects a sound exhibiting a frequency characteristic T4 in which the stationary sound and the user SA4 are synthesized.
 定常音削減部231Nによって、センサ10-1~10-4のそれぞれにより検出された音から定常音が削減される。この結果、センサ10-1のマイクにより検出された音からユーザSA2の音声のみが抽出される。センサ10-2のマイクにより検出された音からは何も抽出されない。センサ10-3のマイクにより検出された音からは動物SA3の鳴き声のみが抽出される。センサ10-4のマイクにより検出された音からはユーザSA4の音声のみが抽出される。 The steady sound reduction unit 231N reduces the steady sound from the sounds detected by the sensors 10-1 to 10-4. As a result, only the voice of user SA2 is extracted from the sound detected by the microphone of sensor 10-1. Nothing is extracted from the sound detected by the microphone of sensor 10-2. Only the cry of the animal SA3 is extracted from the sounds detected by the microphone of the sensor 10-3. Only the voice of user SA4 is extracted from the sound detected by the microphone of sensor 10-4.
 特徴音抽出部2313によって、センサ10-1~10-4のそれぞれから抽出された音から特徴音が抽出される。具体的には、センサ10-1のマイクにより検出された音からユーザSA2の音声が特徴音として抽出される。センサ10-2のマイクにより検出された音からは特徴音は抽出されない。センサ10-3のマイクにより検出された音からは動物SA3の鳴き声が特徴音として抽出される。センサ10-4のマイクにより検出された音からはユーザSA4の音声が特徴音として抽出される。 A characteristic sound is extracted from the sounds extracted from each of the sensors 10-1 to 10-4 by the characteristic sound extraction unit 2313. Specifically, the voice of the user SA2 is extracted as the characteristic sound from the sound detected by the microphone of the sensor 10-1. No characteristic sound is extracted from the sound detected by the microphone of the sensor 10-2. The bark of the animal SA3 is extracted as a characteristic sound from the sound detected by the microphone of the sensor 10-3. The voice of the user SA4 is extracted as a characteristic sound from the sound detected by the microphone of the sensor 10-4.
 位置特定部2314は、センサ10-1の位置(センサ10-1の位置付近)にユーザSA2が存在すると特定する。位置特定部2314は、センサ10-3の位置(センサ10-3の位置付近)に動物SA3が存在すると特定する。位置特定部2314は、センサ10-4の位置(センサ10-4の位置付近)にユーザSA4が存在すると特定される。 The position specifying unit 2314 specifies that the user SA2 exists at the position of the sensor 10-1 (near the position of the sensor 10-1). Position specifying unit 2314 specifies that animal SA3 exists at the position of sensor 10-3 (near the position of sensor 10-3). Position specifying unit 2314 specifies that user SA4 exists at the position of sensor 10-4 (near the position of sensor 10-4).
 選択部232は、通知先のユーザ、すなわちユーザSA2が存在する位置に近い場所に設置されているスピーカ30-1を選択する。出力部233は、選択部232によって選択されたスピーカ30-1に、センサ10-3のマイクから抽出された動物SA3の鳴き声を出力させる。 The selection unit 232 selects the speaker 30-1 installed near the location where the user to be notified, that is, the user SA2 exists. The output unit 233 causes the speaker 30-1 selected by the selection unit 232 to output the cry of the animal SA3 extracted from the microphone of the sensor 10-3.
 なお、図13の例に示すように、定常音のレベルが大きい場合、定常音削減部231Nにより定常音が削減されると、定常音を削減した後の音のレベルが小さくなってしまい、定常音が削減された音から特徴音を抽出することが困難となる可能性がある。この対策として、定常音削減部231Nは、定常音のレベルが所定の閾値より大きい場合、定常音を削減せずに、マイクにより検出された音をそのまま、特徴音抽出部2313に出力するようにしてもよい。 As shown in the example of FIG. 13, when the level of the stationary sound is high, if the stationary sound is reduced by the stationary sound reduction unit 231N, the level of the sound after the reduction of the stationary sound becomes low. It may be difficult to extract feature sounds from the reduced sound. As a countermeasure, the steady sound reduction unit 231N outputs the sound detected by the microphone as it is to the characteristic sound extraction unit 2313 without reducing the steady sound when the level of the steady sound is greater than a predetermined threshold. may
 特徴音抽出部2313は、マイクにより検出された音から、特徴音推定モデルを用いて特徴音を抽出する場合、例えば、定常音が削減されている場合とされていない場合とで、異なる特徴音推定モデルを用いるようにする。例えば、特徴音抽出部2313は、定常音が削減されている場合には、定常音が削減された音から特徴音を推定するモデルを用いる。一方、特徴音抽出部2313は、定常音が削減されていない場合には、定常音が削減されていない音から特徴音を推定するモデルを用いるようにする。 When the feature sound extraction unit 2313 extracts the feature sound from the sound detected by the microphone using the feature sound estimation model, for example, the feature sound is different depending on whether the stationary sound is reduced or not. Try to use estimation models. For example, when the stationary sound has been reduced, the feature sound extraction unit 2313 uses a model for estimating the feature sound from the sound from which the stationary sound has been reduced. On the other hand, when the stationary sound is not reduced, the characteristic sound extraction unit 2313 uses a model for estimating the characteristic sound from the sound for which the stationary sound is not reduced.
 図14は、実施形態の制御装置20が行う処理の流れを説明するフローチャートである。制御装置20は、センサ10により取得された情報(センサ情報)を取得する(ステップS1)。制御装置20は、センサ情報に音が含まれている場合、その音が報知音であるか否かを判定する(ステップS2)。制御装置20は、センサ10がマイクである場合、或いは、複数のセンサ10の一部がマイクである場合、そのマイクにより集音された音と報知音情報221として記憶部22に記憶された音との周波数特性を比較する等の処理により、その音が報知音であるか否かを判定する。 FIG. 14 is a flowchart explaining the flow of processing performed by the control device 20 of the embodiment. The control device 20 acquires information (sensor information) acquired by the sensor 10 (step S1). When the sensor information includes sound, the control device 20 determines whether or not the sound is a notification sound (step S2). When the sensor 10 is a microphone, or when some of the plurality of sensors 10 are microphones, the control device 20 combines the sound collected by the microphone with the sound stored in the storage unit 22 as the notification sound information 221. It is determined whether or not the sound is a notification sound by a process such as comparing the frequency characteristics of the .
 その音が報知音であると判定した場合、制御装置20は、ユーザの位置を特定するために用いる全てのセンサ情報を抽出する(ステップS3)。例えば、制御装置20は、画像を用いてユーザの位置を特定する場合には、イメージセンサにより取得された画像の情報を抽出する。或いは、制御装置20は、温度を用いてユーザの位置を特定する場合には、赤外線センサ及び温度センサにより取得された温度を示す情報を抽出する。制御装置20は、音を用いてユーザの位置を特定する場合には、マイクにより集音された音を示す情報を抽出する。以下のフローでは、制御装置20が音を用いてユーザの位置を特定する場合について説明する。 When determining that the sound is an information sound, the control device 20 extracts all sensor information used to identify the user's position (step S3). For example, when specifying the user's position using an image, the control device 20 extracts information of the image acquired by the image sensor. Alternatively, when the user's position is specified using the temperature, the control device 20 extracts information indicating the temperature acquired by the infrared sensor and the temperature sensor. When identifying the position of the user using sound, the control device 20 extracts information indicating the sound collected by the microphone. In the following flow, a case will be described in which the control device 20 identifies the position of the user using sound.
 制御装置20は、マイクにより集音された音から、定常音を削減する(ステップS4)。制御装置20は、定常音が削減された音から、特徴音を抽出する(ステップS5)。制御装置20は、抽出した特徴音からユーザの位置を特定する(ステップS6)。制御装置20は、特定したユーザの位置に基づいて、報知音を出力するスピーカ30を選択する(ステップS7)。制御装置20は、選択したスピーカ30から報知音を出力する(ステップS8)。 The control device 20 reduces stationary sounds from the sounds collected by the microphone (step S4). The control device 20 extracts a characteristic sound from the sound from which the stationary sound has been reduced (step S5). The control device 20 identifies the position of the user from the extracted characteristic sounds (step S6). The control device 20 selects the speaker 30 for outputting the notification sound based on the identified position of the user (step S7). The control device 20 outputs a notification sound from the selected speaker 30 (step S8).
 一方、ステップS2で、報知音でないと判定した場合、制御装置20は、処理を終了する。 On the other hand, if it is determined in step S2 that it is not the notification sound, the control device 20 ends the process.
 学習部235が、学習済モデル(位置推定モデル、特徴音推定モデル、及び報知音推定モデル)を生成する方法について、図15を参照して説明する。図15は、実施形態の機械学習モデルを生成する処理の流れを説明するフローチャートである。 A method for the learning unit 235 to generate a learned model (position estimation model, characteristic sound estimation model, and notification sound estimation model) will be described with reference to FIG. FIG. 15 is a flowchart illustrating the flow of processing for generating a machine learning model according to the embodiment.
(位置推定モデル)
 まず、位置推定モデルについて説明する。位置推定モデルは、ユーザの位置を推定するモデルである。ここでは、位置推定モデルが、マイクにより集音された音がユーザに起因する音か否かを推定する場合を例示して説明する。
(Position estimation model)
First, the position estimation model will be explained. A location estimation model is a model that estimates a user's location. Here, a case where the position estimation model estimates whether or not the sound collected by the microphone is the sound caused by the user will be described as an example.
 学習部235は、機械学習用のセンサ情報、ここではマイクにより集音された音の情報を取得する(ステップS11)。学習部235は、学習データセットを生成する(ステップS12)。ここでの学習データセットは、マイクにより集音された音に、その音がユーザに起因する音か否かを示すラベルが付けられた情報である。学習部235は、機械学習用のセンサ情報を全て取得したか否かを判定する(ステップS13)。機械学習用のセンサ情報を全て取得していない場合、学習部235は、ステップS1に戻る。 The learning unit 235 acquires sensor information for machine learning, here information on sounds collected by a microphone (step S11). The learning unit 235 generates a learning data set (step S12). The learning data set here is information in which a label indicating whether or not the sound is caused by the user is added to the sound collected by the microphone. The learning unit 235 determines whether or not all sensor information for machine learning has been acquired (step S13). If all the sensor information for machine learning has not been acquired, the learning unit 235 returns to step S1.
 機械学習用のセンサ情報を全て取得した場合、学習部235は、位置推定モデル(学習済モデル)を生成する(ステップS14)。学習部235は、CNN(Convolutional Neural Network)などの機械学習モデルに、学習データセットを学習させることによって、位置推定モデル(学習済モデル)を生成する。学習部235は、CNNなどの機械学習モデルに、学習データセットのうちマイクにより集音された音を入力すると、機械学習モデルから出力される値が、その音に付されたラベル(ユーザに起因する音か否かを示すラベル)に近づくように、機械学習モデルのパラメータを調整しながら、機械学習モデルに、繰り返し学習データセットを学習させる。これにより、マイクにより集音された音が、ユーザ起因の音か否かを精度よく推定することができるモデルを生成することができる。学習部235は、生成した位置推定モデル(学習済モデル)を、学習済モデル情報223として、記憶部22に記憶する(ステップS15)。 When all sensor information for machine learning is acquired, the learning unit 235 generates a position estimation model (learned model) (step S14). The learning unit 235 generates a position estimation model (learned model) by having a machine learning model such as a CNN (Convolutional Neural Network) learn the learning data set. When the learning unit 235 inputs a sound collected by a microphone out of the learning data set to a machine learning model such as CNN, the value output from the machine learning model is a label attached to the sound (attributed to the user). The machine learning model is repeatedly trained on the training data set while adjusting the parameters of the machine learning model so as to approach the label indicating whether or not it is a sound that sounds. This makes it possible to generate a model capable of accurately estimating whether or not the sound collected by the microphone is the sound caused by the user. The learning unit 235 stores the generated position estimation model (learned model) in the storage unit 22 as the learned model information 223 (step S15).
(特徴音推定モデル)
 次に、特徴音推定モデルについて説明する。特徴音推定モデルは、特徴音を推定するモデルである。ここでは、特徴音推定モデルが、マイクにより集音された音が特徴音であるか否かを推定する場合を例示して説明する。
(Feature sound estimation model)
Next, the feature sound estimation model will be described. A feature sound estimation model is a model for estimating a feature sound. Here, a case where the characteristic sound estimation model estimates whether or not the sound collected by the microphone is the characteristic sound will be described as an example.
 学習部235は、機械学習用のセンサ情報、ここではマイクにより集音された音の情報を取得する(ステップS11)。学習部235は、学習データセットを生成する(ステップS12)。ここでの学習データセットは、マイクにより集音された音に、その音が特徴音であるか否かを示すラベルが付けられた情報である。学習部235は、機械学習用のセンサ情報を全て取得したか否かを判定する(ステップS13)。機械学習用のセンサ情報を全て取得していない場合、学習部235は、ステップS1に戻る。 The learning unit 235 acquires sensor information for machine learning, here information on sounds collected by a microphone (step S11). The learning unit 235 generates a learning data set (step S12). The learning data set here is information in which a label indicating whether or not the sound is a characteristic sound is attached to the sound collected by the microphone. The learning unit 235 determines whether or not all sensor information for machine learning has been acquired (step S13). If all the sensor information for machine learning has not been acquired, the learning unit 235 returns to step S1.
 機械学習用のセンサ情報を全て取得した場合、学習部235は、特徴音推定モデル(学習済モデル)を生成する(ステップS14)。学習部235は、CNNなどの機械学習モデルに、学習データセットを学習させることによって、特徴音推定モデル(学習済モデル)を生成する。学習部235は、CNNなどの機械学習モデルに、学習データセットのうちマイクにより集音された音を入力すると、機械学習モデルから出力される値が、その音に付されたラベル(特徴音か否かを示すラベル)に近づくように、機械学習モデルのパラメータを調整しながら、機械学習モデルに、繰り返し学習データセットを学習させる。これにより、マイクにより集音された音が、特徴音か否かを精度よく推定することができるモデルを生成することができる。学習部235は、生成した特徴音推定モデル(学習済モデル)を、学習済モデル情報223として、記憶部22に記憶する(ステップS15)。 When all sensor information for machine learning is acquired, the learning unit 235 generates a feature sound estimation model (learned model) (step S14). The learning unit 235 generates a feature sound estimation model (learned model) by having a machine learning model such as CNN learn the learning data set. When the learning unit 235 inputs a sound collected by a microphone out of the learning data set to a machine learning model such as CNN, the value output from the machine learning model is a label attached to the sound (characteristic sound or not). The machine learning model is trained on the iterative learning data set while adjusting the parameters of the machine learning model so as to approach the label indicating whether or not it is. This makes it possible to generate a model capable of accurately estimating whether a sound collected by a microphone is a characteristic sound or not. The learning unit 235 stores the generated feature sound estimation model (learned model) in the storage unit 22 as the learned model information 223 (step S15).
(報知音推定モデル)
 次に、報知音推定モデルについて説明する。報知音推定モデルは、報知音を推定するモデルである。ここでは、報知音推定モデルが、マイクにより集音された音が報知音であるか否かを推定する場合を例示して説明する。
(Notification sound estimation model)
Next, the notification sound estimation model will be described. The notification sound estimation model is a model for estimating the notification sound. Here, an example will be described in which the notification sound estimation model estimates whether or not the sound collected by the microphone is the notification sound.
 学習部235は、機械学習用のセンサ情報、ここではマイクにより集音された音の情報を取得する(ステップS11)。学習部235は、学習データセットを生成する(ステップS12)。ここでの学習データセットは、マイクにより集音された音に、その音が報知音であるか否かを示すラベルが付けられた情報である。学習部235は、機械学習用のセンサ情報を全て取得したか否かを判定する(ステップS13)。機械学習用のセンサ情報を全て取得していない場合、学習部235は、ステップS1に戻る。 The learning unit 235 acquires sensor information for machine learning, here information on sounds collected by a microphone (step S11). The learning unit 235 generates a learning data set (step S12). The learning data set here is information in which a label indicating whether or not the sound is a notification sound is attached to the sound collected by the microphone. The learning unit 235 determines whether or not all sensor information for machine learning has been acquired (step S13). If all the sensor information for machine learning has not been acquired, the learning unit 235 returns to step S1.
 機械学習用のセンサ情報を全て取得した場合、学習部235は、報知音推定モデル(学習済モデル)を生成する(ステップS14)。学習部235は、CNNなどの機械学習モデルに、学習データセットを機械学習させることによって、報知音推定モデル(学習済モデル)を生成する。学習部235は、CNNなどの機械学習モデルに、学習データセットのうちマイクにより集音された音を入力すると、機械学習モデルから出力される値が、その音に付されたラベル(報知音か否かを示すラベル)に近づくように、機械学習モデルのパラメータを調整しながら、機械学習モデルに、繰り返し学習データセットを機械学習させる。これにより、マイクにより集音された音が、報知音か否かを精度よく推定することができるモデルを生成することができる。学習部235は、生成した報知音推定モデル(学習済モデル)を、学習済モデル情報223として、記憶部22に記憶する(ステップS15)。 When all the sensor information for machine learning is acquired, the learning unit 235 generates a notification sound estimation model (learned model) (step S14). The learning unit 235 generates a notification sound estimation model (learned model) by causing a machine learning model such as CNN to perform machine learning on the learning data set. When the learning unit 235 inputs a sound collected by a microphone in the learning data set to a machine learning model such as CNN, the value output from the machine learning model is a label attached to the sound (notification sound or not). While adjusting the parameters of the machine learning model so as to approach the label indicating whether or not, the machine learning model is machine-learned on the iterative learning data set. This makes it possible to generate a model capable of accurately estimating whether or not the sound collected by the microphone is the notification sound. The learning unit 235 stores the generated notification sound estimation model (learned model) in the storage unit 22 as the learned model information 223 (step S15).
 以上説明したように、実施形態に係る制御装置20は、空間に設けられた複数のスピーカ30のそれぞれに出力させる音を制御する。制御装置20は、取得部230と、特定部231と、選択部232と、出力部233とを備える。取得部230は、空間に設けられた複数のセンサ10によって検知されたセンサ情報(検知結果)を取得する。特定部231は、取得部230によって取得された検知結果に基づいて、空間に存在するユーザの位置を特定する。選択部232は、特定部231によって特定された、ユーザの位置の近傍に設置されたスピーカ30を、複数のスピーカ30から選択する。選択部232は、複数のスピーカ30のうち、ユーザの位置の最も近傍に設置されたスピーカ30を選択してもよい。出力部233は、選択部232によって選択されたスピーカから音を出力させる。 As described above, the control device 20 according to the embodiment controls sounds to be output from each of the plurality of speakers 30 provided in the space. The control device 20 includes an acquisition unit 230 , an identification unit 231 , a selection unit 232 and an output unit 233 . The acquisition unit 230 acquires sensor information (detection results) detected by the plurality of sensors 10 provided in the space. The specifying unit 231 specifies the position of the user existing in the space based on the detection result acquired by the acquiring unit 230 . The selection unit 232 selects the speaker 30 installed near the position of the user identified by the identification unit 231 from the plurality of speakers 30 . The selection unit 232 may select the speaker 30 installed closest to the user's position from among the plurality of speakers 30 . The output unit 233 outputs sound from the speaker selected by the selection unit 232 .
 実施形態に係る制御装置20では、特定部231が、位置推定モデルに検知結果を入力することによって得られる出力に基づいてユーザの位置を特定する。位置推定モデルは、検知結果に、ユーザが存在することに起因してその検知結果が検知されたものか否かを示すラベルが付された学習データセットを用いて、検知結果と、ユーザが存在することに起因してその検知結果が検知されたものか否かとの対応関係を学習した学習済モデルである。 In the control device 20 according to the embodiment, the specifying unit 231 specifies the user's position based on the output obtained by inputting the detection result into the position estimation model. The position estimation model uses a learning data set in which a label indicating whether or not the detection result was detected due to the presence of the user is attached to the detection result, and the detection result and the presence of the user are used. It is a trained model that has learned the correspondence between whether the detection result is detected or not due to the fact that the detection result is detected.
 実施形態に係る制御装置20では、センサ10はマイクであってよい。特定部231は、位置推定モデルに前記検知結果を入力することによって得られる出力に基づいてユーザ位置を特定する。位置推定モデルは、機械学習用の音に、ユーザが存在することに起因する音か否かを示すラベルが付された学習データセットを用いて学習を実行することによって生成されたモデルであって、マイクにより検知された音が、ユーザが存在することに起因する音か否かを推定するモデルである。 In the control device 20 according to the embodiment, the sensor 10 may be a microphone. The specifying unit 231 specifies the user position based on the output obtained by inputting the detection result into the position estimation model. The position estimation model is a model generated by performing learning using a learning data set in which a label indicating whether or not the sound for machine learning is caused by the presence of the user is attached. , is a model for estimating whether or not the sound detected by the microphone is caused by the presence of the user.
 実施形態に係る制御装置20では、センサ10はマイクであってよい。特定部231は、定常音情報220として記憶部22に記憶された定常音(予め定められた所定の定常音)とは異なる特徴を有する音がマイクにより集音された場合、マイクの近傍にユーザが存在すると判定し、マイクの設置位置をユーザの位置と特定する。 In the control device 20 according to the embodiment, the sensor 10 may be a microphone. When a sound having characteristics different from the steady sound (predetermined steady sound) stored in the storage unit 22 as the steady sound information 220 is collected by the microphone, the identifying unit 231 detects the user's voice in the vicinity of the microphone. is present, and the location of the microphone is specified as the location of the user.
 実施形態に係る制御装置20では、センサ10にはマイクが含まれていてよい。特定部231は、マイクによって集音された音の特徴に基づいて、その音がユーザに報知する報知音であるか否かを判定する。出力部233は、特定部231(判定部の一例)によって報知音であると判定された音を、選択部232によって選択されたスピーカ30から音を出力させる。 In the control device 20 according to the embodiment, the sensor 10 may include a microphone. The specifying unit 231 determines whether or not the sound is a notification sound to notify the user based on the characteristics of the sound collected by the microphone. The output unit 233 causes the speaker 30 selected by the selection unit 232 to output the sound determined to be the notification sound by the identification unit 231 (an example of the determination unit).
 実施形態に係る制御装置20では、センサ10にはマイクが含まれていてよい。特定部231は、報知音推定モデルに、マイクにより集音された音を入力することによって得られる出力に基づいて、マイクによって集音された音が、報知音であるか否かを判定する。報知音推定モデルは、機械学習用の音に、その音がユーザに報知する報知音か否かを示すラベルが付された学習データセットを用いて学習を実行することによって生成されたモデルであって、マイクにより検知された音が、報知音か否かを推定するモデルである。 In the control device 20 according to the embodiment, the sensor 10 may include a microphone. The identification unit 231 determines whether or not the sound collected by the microphone is the notification sound based on the output obtained by inputting the sound collected by the microphone into the notification sound estimation model. The notification sound estimation model is a model generated by performing learning using a learning data set in which a machine learning sound is labeled to indicate whether or not the sound is a notification sound to be notified to the user. This is a model for estimating whether or not a sound detected by a microphone is a notification sound.
(実施形態の変形例1)
 ここで、実施形態の変形例1について説明する。本変形例では、報知音推定モデルが追加学習を行う点において、上述した実施形態と相違する。図16は、実施形態の変形例1の追加学習を行う処理の流れを説明するフローチャートである。
(Modification 1 of Embodiment)
Modification 1 of the embodiment will now be described. This modification differs from the above-described embodiment in that the notification sound estimation model performs additional learning. FIG. 16 is a flowchart for explaining the flow of processing for performing additional learning according to Modification 1 of the embodiment.
 制御装置20は、推定用のセンサ情報、ここではマイクにより集音された音の情報を取得する(ステップS21)。制御装置20は、報知音推定モデル(学習済モデル)を用いて、マイクにより集音された音が報知音であるか否かを推定する(ステップS22)。制御装置20は、報知音推定モデルによる推定が正しいか否かを判定する(ステップS23)。制御装置20は、例えば、ユーザがキーボード等を操作することによって入力された情報に基づいて、報知音推定モデルによる推定が正しいか否かを判定する。 The control device 20 acquires sensor information for estimation, here information about sound collected by the microphone (step S21). The controller 20 uses the notification sound estimation model (learned model) to estimate whether or not the sound collected by the microphone is the notification sound (step S22). The control device 20 determines whether or not the estimation by the notification sound estimation model is correct (step S23). The control device 20 determines whether or not the estimation by the notification sound estimation model is correct, for example, based on information input by the user operating a keyboard or the like.
 報知音推定モデルによる推定が正しくない場合、制御装置20は、追加学習用の学習データセットを生成する(ステップS24)。追加学習用の学習データセットは、報知音推定モデルが誤った推定をした音に、その音が報知音であるか否かの正しいラベルを付した情報である。 If the estimation by the notification sound estimation model is incorrect, the control device 20 generates a learning data set for additional learning (step S24). The learning data set for additional learning is information in which correct labels are attached to sounds that are incorrectly estimated by the notification sound estimation model, indicating whether or not the sounds are notification sounds.
 制御装置20は、追加学習を行うか否かを判定する(ステップS25)。制御装置20は、例えば、ステップS24で生成した追加学習用の学習データセットの数が、所定の数に達した場合に追加学習を行うと判定する。或いは、制御装置20は、報知音推定モデルが誤った推定を行う確率が所定の値以上である場合に、追加学習を行うと判定するようにしてもよい。 The control device 20 determines whether or not to perform additional learning (step S25). For example, the control device 20 determines to perform additional learning when the number of learning data sets for additional learning generated in step S24 reaches a predetermined number. Alternatively, the control device 20 may determine to perform additional learning when the probability that the notification sound estimation model makes an erroneous estimation is greater than or equal to a predetermined value.
 追加学習を行う場合、制御装置20は、追加学習用の学習データセットを機械学習させることにより、追加学習を行い、報知音推定モデル(学習済モデル)を更新する(ステップS26)。制御装置20は、更新した報知音推定モデル(学習済モデル)を記憶する(ステップS27)。 When performing additional learning, the control device 20 performs additional learning by subjecting the learning data set for additional learning to machine learning, and updates the notification sound estimation model (learned model) (step S26). The control device 20 stores the updated notification sound estimation model (learned model) (step S27).
 以上説明したように、実施形態の変形例1に係る制御装置20は、取得部230が、出力部233によって出力された音が報知音であるか否かが、ユーザにより判断された判断結果を取得する。学習部235が、取得部230によって取得された判断結果に基づいて報知音推定モデルに追加の学習を行う。これにより、実施形態の変形例1に係る制御装置20では、報知音推定モデルに追加学習を行うことができ、推定に誤りがある場合にその推定の誤りを正すことができる。或いは、新たに報知音が追加される場合に、その追加された報知音を、報知音推定モデルに機械学習させることができる。 As described above, in the control device 20 according to Modification 1 of the embodiment, the acquisition unit 230 obtains the result of the user's determination as to whether or not the sound output by the output unit 233 is the notification sound. get. A learning unit 235 performs additional learning for the notification sound estimation model based on the determination result acquired by the acquisition unit 230 . As a result, in the control device 20 according to Modification 1 of the embodiment, additional learning can be performed on the notification sound estimation model, and if there is an error in the estimation, the error in the estimation can be corrected. Alternatively, when a new notification sound is added, the added notification sound can be machine-learned by the notification sound estimation model.
 なお、上記では、報知音推定モデルに追加学習を行う場合を例示して説明したが、これに限定されない。制御装置20は、位置推定モデル、及び特徴音推定モデルについても、同様な方法を用いて、追加学習を行うことができる。 In the above, the case where additional learning is performed on the notification sound estimation model has been exemplified and explained, but it is not limited to this. The control device 20 can also perform additional learning on the position estimation model and the feature sound estimation model using a similar method.
(実施形態の変形例2)
 ここで、実施形態の変形例2について説明する。図17は、住宅H内に複数のユーザU1、U2が存在する例を示す。本変形例では、ユーザU1が発信端末Tを所持する点において、上述した実施形態と相違する。発信端末Tは、ユーザU1の存在を示す信号を発信する端末装置であって、例えば、スマートフォン、ビーコン端末などである。取得部230が、発信端末Tから発信された信号(位置信号)を取得する。特定部231は、取得部230により取得された信号(位置信号)に基づいて、ユーザU1(発信端末)の位置を特定する。
(Modification 2 of Embodiment)
Modification 2 of the embodiment will now be described. FIG. 17 shows an example in which a plurality of users U1 and U2 are present in a house H. As shown in FIG. This modification differs from the above-described embodiment in that user U1 possesses a calling terminal T. FIG. The transmission terminal T is a terminal device that transmits a signal indicating the presence of the user U1, and is, for example, a smart phone, a beacon terminal, or the like. Acquisition unit 230 acquires a signal (position signal) transmitted from transmitting terminal T. FIG. The specifying unit 231 specifies the position of the user U1 (originating terminal) based on the signal (position signal) acquired by the acquiring unit 230 .
 以上説明したように、実施形態の変形例2に係る制御装置20では、空間には複数のユーザU1、U2が存在してもよい。複数のユーザU1、U2のうち少なくとも一人のユーザU1はユーザU1の位置を示す位置信号を発信する発信端末Tを所持する。取得部230が、発信端末Tから発信された信号(位置信号)を取得する。特定部231は、取得部230により取得された信号(位置信号)に基づいて、ユーザU1の位置を特定する。これにより、実施形態の変形例2に係る制御装置20では、発信端末Tを所持するユーザU1については、発信端末Tから発信された信号に基づいて、精度よくユーザU1の位置を特定することができる。 As described above, in the control device 20 according to Modification 2 of the embodiment, a plurality of users U1 and U2 may exist in the space. At least one user U1 among the plurality of users U1 and U2 possesses a transmission terminal T that transmits a position signal indicating the position of user U1. Acquisition unit 230 acquires a signal (position signal) transmitted from transmitting terminal T. FIG. The specifying unit 231 specifies the position of the user U1 based on the signal (position signal) acquired by the acquiring unit 230 . As a result, in the control device 20 according to the second modification of the embodiment, the position of the user U1 possessing the originating terminal T can be accurately specified based on the signal transmitted from the originating terminal T. can.
(実施形態の変形例3)
 ここで、実施形態の変形例3について説明する。本変形例では、ユーザが、所定時間継続する音、例えば音楽などを聴く場合を前提とする。本変形例では、ユーザが空間内を移動した場合に、その移動に追従して音を出力する点において、上述した実施形態と相違する。
(Modification 3 of Embodiment)
Here, Modification 3 of the embodiment will be described. In this modified example, it is assumed that the user listens to a sound that continues for a predetermined period of time, such as music. This modification differs from the above-described embodiment in that when the user moves in the space, sound is output following the movement.
 特定部231(判定部の一例)は、マイクによって集音された音の特徴に基づいて、その音が、ユーザの移動に追従して出力させる音(ユーザの移動に追従して出力させるスピーカ30を変更すべき音)であるか否かを判定する。
 特定部231は、マイクによって集音された音が、ユーザの移動に追従して出力させる音である場合、その音を集音したマイクの近くにいるユーザを特定する。例えば、特定部231は、ユーザの移動に追従して出力させる音と共に集音された特徴音に基づいて、そのマイクの近傍にいるユーザを特定する。ユーザの移動に追従して出力させる音がマイクによって取得される間、繰り返し、そのユーザの位置を特定する。
 選択部232は、前回特定されたユーザの位置と、今回特定されたユーザの位置が異なる場合、今回特定された位置の近傍に設置されたスピーカ30を、選択する。
 出力部233は、選択部232によって前回選択されたスピーカ30から出力させていた音を停止させ、今回選択されたスピーカ30からユーザの移動に追従して出力させる音を出力させる。
The identification unit 231 (an example of the determination unit) determines the sound to be output following the movement of the user (speaker 30 to be output following the movement of the user) based on the characteristics of the sound collected by the microphone. to be changed).
If the sound collected by the microphone is the sound to be output following the movement of the user, the identification unit 231 identifies the user near the microphone that collected the sound. For example, the identification unit 231 identifies the user near the microphone based on the characteristic sound collected together with the sound to be output following the movement of the user. The position of the user is repeatedly specified while the sound to be output following the movement of the user is acquired by the microphone.
When the position of the user identified last time is different from the position of the user identified this time, the selection unit 232 selects the speaker 30 installed near the position identified this time.
The output unit 233 stops the sound output from the speaker 30 selected last time by the selection unit 232, and causes the speaker 30 selected this time to output the sound following the movement of the user.
 以上説明したように、実施形態の変形例3に係る制御装置20では、センサ10にはマイクが含まれていてよい。特定部231は、マイクによって集音された音が、ユーザの移動に追従して出力させる音であるか否かを判定する。特定部231は、ユーザの移動に追従して出力させる音であると判定された音を集音したマイクの近傍にいるユーザを特定する。特定部231は、特定したユーザの位置と、ユーザの移動に追従して出力させる音とを対応付け、ユーザの位置を繰り返し特定する。選択部232は、前回特定されたユーザの位置と今回特定されたユーザの位置が異なる場合、今回特定されたユーザの位置の近傍に設置されたスピーカ30を選択する。出力部233は、選択部232によって前回選択されたスピーカ30から出力させていた音を停止させ、今回選択されたスピーカ30から、ユーザの移動に追従して出力させる音を出力させる。これにより、実施形態の変形例3に係る制御装置20では、ユーザが音楽などを聴いている場合において、その音を出力しているスピーカ30の設置位置からユーザが移動した場合であっても、移動先のスピーカ30から音を出力させることができ、移動に追従して、音を出力させるスピーカ30を変更することができる。これにより、ユーザは、空間を移動しながら、音を聴き続けることができる。 As described above, in the control device 20 according to Modification 3 of the embodiment, the sensor 10 may include a microphone. The specifying unit 231 determines whether or not the sound collected by the microphone is the sound to be output following the movement of the user. The identifying unit 231 identifies a user near the microphone that has collected the sound determined to be the sound to be output following the movement of the user. The specifying unit 231 associates the specified position of the user with the sound to be output following the movement of the user, and repeatedly specifies the position of the user. When the position of the user identified last time is different from the position of the user identified this time, the selection unit 232 selects the speaker 30 installed near the position of the user identified this time. The output unit 233 stops the sound output from the speaker 30 selected last time by the selection unit 232, and outputs the sound to follow the movement of the user from the speaker 30 selected this time. As a result, in the control device 20 according to the third modification of the embodiment, when the user is listening to music or the like, even if the user moves from the installation position of the speaker 30 that outputs the sound, Sound can be output from the speaker 30 at the destination of movement, and the speaker 30 from which the sound is output can be changed following the movement. This allows the user to continue listening to sounds while moving in space.
 上述した実施形態におけるスピーカシステム1、及び制御装置20の全部または一部をコンピュータで実現するようにしてもよい。その場合、この機能を実現するためのプログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行することによって実現してもよい。なお、ここでいう「コンピュータシステム」とは、OSや周辺機器等のハードウェアを含むものとする。また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ROM、CD-ROM等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムを送信する場合の通信線のように、短時間の間、動的にプログラムを保持するもの、その場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリのように、一定時間プログラムを保持しているものも含んでもよい。また上記プログラムは、前述した機能の一部を実現するためのものであってもよく、さらに前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるものであってもよく、FPGA等のプログラマブルロジックデバイスを用いて実現されるものであってもよい。 All or part of the speaker system 1 and the control device 20 in the above-described embodiment may be realized by a computer. In that case, a program for realizing this function may be recorded in a computer-readable recording medium, and the program recorded in this recording medium may be read into a computer system and executed. It should be noted that the “computer system” here includes hardware such as an OS and peripheral devices. The term "computer-readable recording medium" refers to portable media such as flexible discs, magneto-optical discs, ROMs and CD-ROMs, and storage devices such as hard discs incorporated in computer systems. Furthermore, "computer-readable recording medium" means a medium that dynamically retains a program for a short period of time, like a communication line when transmitting a program via a network such as the Internet or a communication line such as a telephone line. It may also include something that holds the program for a certain period of time, such as a volatile memory inside a computer system that serves as a server or client in that case. Further, the program may be for realizing a part of the functions described above, or may be capable of realizing the functions described above in combination with a program already recorded in the computer system. It may be implemented using a programmable logic device such as FPGA.
 本発明のいくつかの実施形態を説明したが、これらの実施形態は、例として提示したものであり、発明の範囲を限定することは意図していない。これら実施形態は、その他の様々な形態で実施されることが可能であり、発明の要旨を逸脱しない範囲で、種々の省略、置き換え、変更を行うことができる。これら実施形態やその変形は、発明の範囲や要旨に含まれると同様に、特許請求の範囲に記載された発明とその均等の範囲に含まれるものである。 Although several embodiments of the invention have been described, these embodiments are presented as examples and are not intended to limit the scope of the invention. These embodiments can be implemented in various other forms, and various omissions, replacements, and modifications can be made without departing from the scope of the invention. These embodiments and their modifications are included in the scope and spirit of the invention, as well as the scope of the invention described in the claims and equivalents thereof.
 本開示は、制御方法、制御装置、及びプログラムに適用してもよい。 The present disclosure may be applied to control methods, control devices, and programs.
1…スピーカシステム
10…センサ
20…制御装置
230…取得部
231…特定部(判定部)
232…選択部
233…出力部
234…装置制御部
235…学習部
30…スピーカ
Reference Signs List 1 speaker system 10 sensor 20 control device 230 acquisition unit 231 identification unit (determination unit)
232... Selection unit 233... Output unit 234... Device control unit 235... Learning unit 30... Speaker

Claims (13)

  1.  コンピュータによって行われる制御方法であって、
     空間内に設けられた少なくとも一つのセンサによって検知された検知結果を取得し、
     前記検知結果に基づいて、前記空間内におけるユーザの位置を特定し、
     前記空間内に設けられた複数のスピーカから、前記特定された前記ユーザの位置の近傍に設置されたスピーカを選択し、
     前記選択されたスピーカに音を出力させる、
     ことを含む制御方法。
    A computer-implemented control method comprising:
    Acquiring a detection result detected by at least one sensor provided in the space,
    Identifying the position of the user in the space based on the detection result,
    Selecting a speaker installed near the identified position of the user from a plurality of speakers installed in the space;
    causing the selected speaker to output sound;
    control method including
  2.  前記ユーザの位置を特定することは、位置推定モデルに前記検知結果を入力することによって得られる出力に基づいて前記ユーザの位置を特定することを含み、
     前記位置推定モデルは、前記ユーザが存在することに起因して検知されたか否かを示すラベルが付された機械学習用の検知結果を含む学習データセットを用いて学習を実行することによって生成されたモデルである、
     請求項1に記載の制御方法。
    Identifying the location of the user includes identifying the location of the user based on an output obtained by inputting the sensing result into a location estimation model;
    The location estimation model is generated by performing learning using a learning data set including detection results for machine learning labeled as to whether or not the user was detected due to the presence of the user. is a model,
    The control method according to claim 1.
  3.  前記空間内に存在する第2のユーザが所持する発信端末から、前記第2ユーザの位置を示す位置信号を取得し、
     前記位置信号に基づいて、前記第2のユーザの位置を特定する、
     ことをさらに含む請求項1又は請求項2に記載の制御方法。
    acquiring a position signal indicating the position of the second user from a transmission terminal possessed by the second user existing in the space;
    determining the location of the second user based on the location signal;
    3. The control method according to claim 1 or 2, further comprising:
  4.  前記ユーザの位置を特定することは、位置推定モデルに前記センサにより検知された音を入力することによって得られる出力に基づいて前記ユーザの位置を特定することを含み、
     前記位置推定モデルは、前記ユーザが存在することに起因する音か否かを示すラベルが付された音を含む学習データセットを用いて学習を実行することによって生成されたモデルである、
     請求項1から請求項3のいずれか一項に記載の制御方法。
    locating the user includes locating the user based on an output obtained by inputting sound sensed by the sensor into a location estimation model;
    The position estimation model is a model generated by performing learning using a training data set containing sounds labeled as to whether or not the sounds are caused by the presence of the user.
    The control method according to any one of claims 1 to 3.
  5.  前記ユーザの位置を特定することは、所定の定常音とは異なる特徴を有する音が前記センサにより集音された場合、前記センサの設置位置を、前記ユーザの位置と特定することを含む、
     請求項1から請求項4のいずれか一項に記載の制御方法。
    Identifying the position of the user includes identifying the installation position of the sensor as the position of the user when a sound having characteristics different from a predetermined steady sound is collected by the sensor.
    The control method according to any one of claims 1 to 4.
  6.  前記センサによって集音された音の特徴に基づいて、前記センサによって集音された音が、前記ユーザに報知する報知音であるか否かを判定することをさらに含み、
     前記選択されたスピーカに前記音を出力させることは、前記センサによって集音された音が前記報知音であると判定された場合に、前記センサによって集音された音を前記選択されたスピーカに出力させることを含む、
     請求項1から請求項5のいずれか一項に記載の制御方法。
    Further comprising determining whether the sound collected by the sensor is a notification sound to be notified to the user based on the characteristics of the sound collected by the sensor,
    Outputting the sound from the selected speaker means that, when the sound collected by the sensor is determined to be the notification sound, the sound collected by the sensor is output to the selected speaker. including outputting
    The control method according to any one of claims 1 to 5.
  7.  報知音推定モデルに前記センサにより集音された音を入力することによって得られる出力に基づいて、前記センサによって集音された音が、前記ユーザに報知する報知音であるか否かを判定し、
     前記報知音推定モデルは、前記報知音か否かを示すラベルが付された機械学習用の音を含む学習データセットを用いて学習を実行することによって生成されたモデルある、
     請求項1から請求項6のいずれか一項に記載の制御方法。
    Based on the output obtained by inputting the sound collected by the sensor into the notification sound estimation model, it is determined whether or not the sound collected by the sensor is the notification sound to be notified to the user. ,
    The notification sound estimation model is a model generated by performing learning using a learning data set containing sounds for machine learning labeled to indicate whether they are notification sounds.
    The control method according to any one of claims 1 to 6.
  8.  前記出力された音が前記報知音であるか否かの前記ユーザによる判断結果を取得し、
     前記判断結果に基づいて前記報知音推定モデルに追加の学習を行う、
     ことをさらに含む請求項7に記載の制御方法。
    Acquiring a determination result by the user as to whether or not the output sound is the notification sound;
    performing additional learning on the notification sound estimation model based on the determination result;
    8. The control method of claim 7, further comprising:
  9.  前記複数のスピーカのうち、前記選択されたスピーカ以外の少なくとも一つのスピーカをスタンバイモードに設定する、
     ことをさらに含む請求項1から請求項8のいずれか一項に記載の制御方法。
    setting at least one of the plurality of speakers other than the selected speaker to standby mode;
    9. The control method according to any one of claims 1 to 8, further comprising:
  10.  前記センサによって集音された音の特徴に基づいて、前記センサによって集音された音が、前記ユーザの移動に追従して出力させる音であるか否かを判定し、
     前記ユーザの位置を特定することは、前記センサによって集音された音が前記ユーザの移動に追従して出力させる音であると判定された場合、前記ユーザの位置の特定を繰り返すことを含み、
     前記スピーカを選択することは、前回特定された前記ユーザの位置と今回特定された前記ユーザの位置が異なる場合、今回特定された前記ユーザの位置の近傍に設置されたスピーカを、前記複数のスピーカから選択することを含み、
      前記選択されたスピーカに音を出力させることは、前記今回特定された前記ユーザの位置の近傍に設置されたスピーカから前記ユーザの移動に追従して出力させる音を出力させることを含む、
     請求項1から請求項9のいずれか一項に記載の制御方法。
    Determining whether or not the sound collected by the sensor is a sound to be output following the movement of the user based on the characteristics of the sound collected by the sensor;
    Identifying the position of the user includes repeating the identification of the position of the user when it is determined that the sound collected by the sensor is the sound to be output following the movement of the user,
    Selecting the speaker means that, when the position of the user identified last time and the position of the user identified this time are different, a speaker installed near the position of the user identified this time is selected from the plurality of speakers. including selecting from
    Outputting the sound from the selected speaker includes outputting the sound that follows the movement of the user from the speaker installed near the user's position identified this time,
    The control method according to any one of claims 1 to 9.
  11.  前回特定された前記ユーザの位置の近傍に設置されていたスピーカからの前記ユーザの移動に追従して出力させる音の出力を停止させる、
     ことをさらに含む請求項10に記載の制御方法。
    stopping the output of the sound to follow the movement of the user from the speaker installed near the position of the user identified last time;
    11. The control method of claim 10, further comprising:
  12.  空間内に設けられた少なくとも一つのセンサによって検知された検知結果を取得する取得部と、
     前記検知結果に基づいて、前記空間内におけるユーザの位置を特定する特定部と、
     前記空間内に設けられた複数のスピーカから、前記特定された前記ユーザの位置の近傍に設置されたスピーカを選択する選択部と、
     前記選択されたスピーカに音を出力させる出力部と、
     を備える制御装置。
    an acquisition unit that acquires a detection result detected by at least one sensor provided in the space;
    a specifying unit that specifies the position of the user in the space based on the detection result;
    a selection unit that selects a speaker installed near the identified position of the user from a plurality of speakers installed in the space;
    an output unit that outputs sound to the selected speaker;
    A control device comprising:
  13.  コンピュータに、
     空間内に設けられた少なくとも一つのセンサによって検知された検知結果を取得することと、
     前記検知結果に基づいて、前記空間内におけるユーザの位置を特定することと、
     前記空間内に設けられた複数のスピーカから、前記特定された前記ユーザの位置の近傍に設置されたスピーカを選択することと、
     前記選択されたスピーカに音を出力させることと、
     を実行させるプログラム。
    to the computer,
    Acquiring a detection result detected by at least one sensor provided in the space;
    Identifying the user's position in the space based on the detection result;
    selecting a speaker installed near the identified position of the user from a plurality of speakers installed in the space;
    causing the selected speaker to output sound;
    program to run.
PCT/JP2022/003789 2021-03-26 2022-02-01 Control method, control device, and program WO2022201876A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2021-052696 2021-03-26
JP2021052696A JP2022150204A (en) 2021-03-26 2021-03-26 Control method, controller, and program

Publications (1)

Publication Number Publication Date
WO2022201876A1 true WO2022201876A1 (en) 2022-09-29

Family

ID=83396791

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2022/003789 WO2022201876A1 (en) 2021-03-26 2022-02-01 Control method, control device, and program

Country Status (2)

Country Link
JP (1) JP2022150204A (en)
WO (1) WO2022201876A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010200212A (en) * 2009-02-27 2010-09-09 Sony Corp Information processing apparatus and method, and program
JP2021002013A (en) * 2019-06-24 2021-01-07 日本キャステム株式会社 Notification sound detection device and notification sound detection method
JP2021015084A (en) * 2019-07-16 2021-02-12 Kddi株式会社 Sound source localization device and sound source localization method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010200212A (en) * 2009-02-27 2010-09-09 Sony Corp Information processing apparatus and method, and program
JP2021002013A (en) * 2019-06-24 2021-01-07 日本キャステム株式会社 Notification sound detection device and notification sound detection method
JP2021015084A (en) * 2019-07-16 2021-02-12 Kddi株式会社 Sound source localization device and sound source localization method

Also Published As

Publication number Publication date
JP2022150204A (en) 2022-10-07

Similar Documents

Publication Publication Date Title
US9668073B2 (en) System and method for audio scene understanding of physical object sound sources
US11308955B2 (en) Method and apparatus for recognizing a voice
JP2020505648A (en) Change audio device filter
JP7212718B2 (en) LEARNING DEVICE, DETECTION DEVICE, LEARNING METHOD, LEARNING PROGRAM, DETECTION METHOD, AND DETECTION PROGRAM
CN104103271B (en) Method and system for adapting speech recognition acoustic models
US20200152191A1 (en) Information processor and information procesing method
JP5626372B2 (en) Event detection system
Portet et al. Context-aware voice-based interaction in smart home-vocadom@ a4h corpus collection and empirical assessment of its usefulness
WO2019235134A1 (en) Information generation device, information processing system, information processing method, and program
Yang et al. Soundr: head position and orientation prediction using a microphone array
WO2022201876A1 (en) Control method, control device, and program
JP7452528B2 (en) Information processing device and information processing method
WO2021199284A1 (en) Information processing device, information processing method, and information processing program
WO2021226574A1 (en) System and method for multi-microphone automated clinical documentation
US11620997B2 (en) Information processing device and information processing method
JP7408518B2 (en) Information processing device, information processing method, information processing program, terminal device, inference method, and inference program
US20220360935A1 (en) Sound field control apparatus and method for the same
US20220391758A1 (en) Sound detection for electronic devices
JP6688820B2 (en) Output device, output method, and output program
US10601757B2 (en) Multi-output mode communication support device, communication support method, and computer program product
Völker et al. iHouse: A Voice-Controlled, Centralized, Retrospective Smart Home
JP2020086011A (en) Extraction device, learning device, extraction method, extraction program, learning method, and learning program
WO2023141564A1 (en) Data augmentation system and method for multi-microphone systems
WO2023141557A1 (en) Data augmentation system and method for multi-microphone systems
WO2023141565A1 (en) Data augmentation system and method for multi-microphone systems

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22774678

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22774678

Country of ref document: EP

Kind code of ref document: A1