WO2022201876A1

WO2022201876A1 - Control method, control device, and program

Info

Publication number: WO2022201876A1
Application number: PCT/JP2022/003789
Authority: WO
Inventors: 良太郎青木
Original assignee: ヤマハ株式会社
Priority date: 2021-03-26
Filing date: 2022-02-01
Publication date: 2022-09-29
Also published as: JP2022150204A

Abstract

This control method is implemented by a computer and comprises: acquiring a detection result detected by at least one sensor placed in a space; determining the position of a user in the space on the basis of the detection result; selecting, from among a plurality of speakers placed in the space, a speaker placed in the vicinity of the determined position of the user; and causing the selected speaker to output sounds.

Description

Control method, control device, and program

The present invention relates to a control method, a control device, and a program.
This application claims priority based on Japanese Patent Application No. 2021-052696 filed on March 26, 2021, the entire disclosure of which is incorporated herein.

For example, Patent Literature 1 discloses a technology related to an information processing apparatus capable of responding in accordance with the user's intention.

WO2019/082630

With the technology described in Patent Document 1, feedback from the home console is output as voice by the user interacting with the home console provided at a predetermined location. For example, in the case where the user utters the utterance "Please boil the bath", if the bath does not finish boiling by the time the user comes home, the fact is fed back to the user. be. However, it is likely that the user will often point the home console at a location and then move from that location. In such a case, since the user is not near the home console, there is a problem that the feedback cannot be heard from the home console.

The present invention was made in view of such circumstances. An example of an object of the present invention is to enable the user to select a speaker as a sound output destination so that the user can hear the sound notified to the user even when the user moves. It is to provide a control method, a control device, and a program.

In order to solve the above-described problems, one aspect of the present invention is performed by a computer, acquires a detection result detected by at least one sensor provided in a space, and based on the detection result, the space A user's position in the space is specified, a speaker installed near the specified position of the user is selected from a plurality of speakers installed in the space, and the selected speaker is caused to output sound. , is a control method including

In one aspect of the present invention, an acquisition unit acquires a detection result detected by at least one sensor provided in a space; a selection unit that selects a speaker installed near the specified position of the user from a plurality of speakers installed in the space; and an output unit that outputs sound to the selected speaker. is a control device comprising:

Further, according to one aspect of the present invention, a computer obtains a detection result detected by at least one sensor provided in a space, and based on the detection result, a user's position in the space is specified. selecting a speaker installed near the specified position of the user from a plurality of speakers installed in the space; causing the selected speaker to output sound; is a program that executes

According to the embodiment of the present invention, even when the user moves, it is possible to select a speaker as a sound output destination so that the user can hear the sound notified to the user. can.

It is a figure showing an outline of speaker system 1 in an embodiment. 1 is a block diagram showing an example of the configuration of a speaker system 1 according to an embodiment; FIG. 2 is a block diagram showing an example of the configuration of a control device 20 in the embodiment; FIG. It is a figure which shows the example of the stationary sound information 220 in embodiment. It is a figure which shows the example of the notification sound information 221 in embodiment. It is a figure which shows the example of the installation information 222 in embodiment. It is a figure which shows the example of the installation information 222 in embodiment. It is a figure explaining the process which the specific|specification part 231 in embodiment performs. It is a figure explaining the process which the specific|specification part 231 in embodiment performs. It is a figure explaining the stationary sound in embodiment. It is a figure explaining the characteristic sound in embodiment. It is a figure explaining the process which the control apparatus 20 in embodiment performs. It is a figure explaining the process which the control apparatus 20 in embodiment performs. 4 is a flowchart for explaining the flow of processing performed by the control device 20 of the embodiment; It is a figure explaining the process which produces|generates the learning model of embodiment. It is a figure explaining the additional learning which concerns on the modification 1 of embodiment. It is a figure for demonstrating the modification 2 of embodiment.

Hereinafter, embodiments of the present invention will be described with reference to the drawings.

FIG. 1 is a diagram showing an overview of a speaker system 1 according to an embodiment. As shown in the example of FIG. 1, the speaker system 1 is applied in a space where a user lives, such as a house H, for example. When speaker system 1 is applied to house H, house H is provided with a plurality of sensors 10-1 to 10-3, a control device 20, and a plurality of speakers 30-1 to 30-3. Sensors 10-1 to 10-3 (and sensors 10-1 to 10-N to be described later) are referred to as sensor 10 when they are not distinguished from each other. The speakers 30-1 to 30-3 (and the speakers 30-1 to 10-M to be described later) are referred to as speakers 30 when they are not distinguished from each other. Each of the multiple sensors 10 is provided in a different space in the house H. FIG. Specifically, each of the plurality of sensors 10 is provided in the entrance of the house H, the kitchen, the living room, the bedroom, and the like. Moreover, each of the plurality of speakers 30 is provided in a different space in the house H. As shown in FIG. Both the sensor 10 and the speaker 30 may be provided in the same space. Only one of the sensor 10 and the speaker 30 may be provided in the space such as the entrance of the house H, the kitchen, the living room, the bedroom, or the like. Also, the number of sensors 10 connected to the control device 20 and the number of speakers 30 may be determined arbitrarily.

FIG. 2 is a block diagram showing an example of the configuration of the speaker system 1 according to the embodiment. As shown in FIG. 2, the speaker system 1 includes, for example, multiple sensors 10-1 to 10-N, a control device 20, and multiple speakers 30-1 to 30-M. In the speaker system 1, the plurality of sensors 10, the control device 20, and the plurality of speakers 30 are communicably connected via a communication network NW.

The communication network NW may be a wide area network, that is, a WAN (Wide Area Network), the Internet, or a combination thereof. The communication network NW may be a communication network that is communicatively connected by a wired connection such as a cable or a wireless connection such as a wireless LAN.

The sensor 10 includes, for example, a sensor section and a communication section. The sensor unit acquires information that can detect the position and movement of the user. The communication unit transmits information acquired by the sensor unit to the control device 20 .

In this embodiment, the sensor unit is a microphone (see, for example, FIG. 9). When the sensor section is a microphone, the sensor section collects sound propagating in the space in which the sensor section is provided. It is possible to detect the user's position and movement by analyzing the sound collected by the microphone. A method for detecting the user's position and movement based on the sound collected by the microphone will be described later in detail.

The sensor part is not limited to the microphone. The sensor unit may be any sensor that can acquire information that can detect the position and movement of the user. For example, the sensor unit may be an image sensor, an infrared sensor, a temperature sensor, an optical sensor, or the like. For example, when the sensor section is an image sensor, the sensor section captures an image of the space in which the sensor 10 is provided. By analyzing the image, it is possible to detect the position and movement of the user. When the sensor unit is an infrared sensor, a temperature sensor, an optical sensor, or the like, the sensor unit detects temperature detected by an infrared ray or a thermometer, or phase difference between irradiated light and reflected light. do. It is possible to detect the position and movement of the user based on changes in temperature, phase difference, and the like.

The speaker 30 is connected to the control device 20 and outputs sound based on the control of the control device 20 . The speaker 30 may be any speaker device as long as it can output sound at least based on the control of the control device 20 .

The control device 20 is, for example, a computer device such as a PC (Personal Computer) or a server device. The control device 20 receives information acquired by each of the sensors 10 . The control device 20 identifies the position of the user based on the acquired information. The control device 20 outputs sound from the speaker 30 provided near the position identified as the user's presence.

FIG. 3 is a block diagram showing an example of the configuration of the control device 20 in the embodiment. The control device 20 includes a communication unit 21, a storage unit 22, and a control unit 23, for example. The communication unit 21 is realized by, for example, a general-purpose communication IC (Integrated Circuit). The communication unit 21 communicates with the sensor 10 and the speaker 30 via the communication network NW.

The storage unit 22 is realized by, for example, a storage device (a storage device with a non-transitory storage medium) such as a HDD (Hard Disk Drive) or flash memory, or a combination thereof. The storage unit 22 stores a program for realizing each component (each function) of the control device 20, variables used when executing the program, and various kinds of information. The storage unit 22 stores stationary sound information 220, notification sound information 221, installation information 222, and learned model information 223, for example. Details of the information stored in the storage unit 22 will be described later.

The control unit 23 is implemented by causing a CPU provided as hardware in the control device 20 to execute a program. The control unit 23 includes an acquisition unit 230, an identification unit 231, a selection unit 232, an output unit 233, a device control unit 234, and a learning unit 235, for example.

The acquisition unit 230 acquires information acquired by the sensor 10 via the communication unit 21 and outputs the acquired information to the identification unit 231 .

The specifying unit 231 specifies the user's position based on the information acquired from the acquiring unit 230 . A method by which the identifying unit 231 identifies the position of the user will be described later in detail. The specifying unit 231 outputs information indicating the specified position of the user to the selecting unit 232 .

Also, the specifying unit 231 specifies the notification sound based on the information acquired from the acquiring unit 230 . The notification sound is a sound to be notified to the user, and is, for example, a ringing sound of an interphone or the like. The specifying unit 231 specifies, as the notification sound, a sound having frequency characteristics similar to the frequency characteristics of the sound stored in advance as the notification sound information 221 in the storage unit 22 . The specifying unit 231 is an example of a “determination unit”.

The selection unit 232 selects the speaker 30 close to the user's position based on the information indicating the user's position acquired from the identification unit 231 . The selection unit 232 refers to the installation information 222 based on information indicating the user's position, for example. The installation information 222 is information indicating the position where the speaker 30 is installed. The selection unit 232 compares the position of the user with the positions where the speakers 30 are installed, and selects the speaker 30 close to the position where the user is. For example, the selection unit 232 selects the speaker 30 closest to the user from the plurality of speakers 30 . The selection unit 232 outputs information indicating the selected speaker 30 to the output unit 233 .

Based on the information indicating the speaker 30 acquired from the selection unit 232, the output unit 233 causes the speaker 30 indicated in the information to output sound. Here, the sound that the output unit 233 causes the speaker 30 to output may be any sound as long as it is a sound that should be notified to the user. For example, the output unit 233 causes the speaker 30 to output the notification sound of an intercom or the like, music that the user is listening to, or the like.

Also, the output unit 233 may set the speaker 30 that is not outputting sound among the plurality of speakers 30 to the standby mode.

The device control unit 234 comprehensively controls the control device 20 . For example, the device control unit 234 outputs information received by the communication unit 21 , that is, information acquired from the sensor 10 to the acquisition unit 230 . The device control unit 234 also outputs information output by the output unit 233 , that is, information indicating the sound to be output by the speaker 30 to the communication unit 21 . As a result, sound is output from the speaker 30 .

The learning unit 235 generates a learned model by subjecting the machine learning model to machine learning. The learning unit 235 generates a model for estimating the user's position (hereinafter referred to as a position estimation model). The position estimation model is a model for estimating the user's position from sounds collected by a microphone as the sensor 10 . The method by which the learning unit 235 creates the position estimation model will be described later in detail.

The learning unit 235 also generates a model for estimating characteristic sounds (hereinafter referred to as a characteristic sound estimation model). A feature sound estimation model is a model for estimating whether or not a sound includes a feature sound. A characteristic sound here is a characteristic sound caused by the user's position or movement. The characteristic sound may be the characteristic sound of an unspecified user or the characteristic sound of a specific user.

A characteristic sound of an unspecified user is a sound that cannot identify an individual user when a plurality of users live in the house H, and is a characteristic sound that is caused by the positions and movements of the users. is. For example, the characteristic sound of an unspecified user is the sound of opening and closing the door of the living room, the startup sound or operation sound generated by operating a television or the like.

A characteristic sound of a specific user is a characteristic sound caused by the positions and movements of individual users when multiple users live in House H. For example, the characteristic sound of a specific user is the sound uttered by the specific user, footsteps of the specific user, and the like.

The feature sound estimation model outputs the result of estimating whether or not the sound input to the model is a feature sound that indicates the position or movement of a specific user. The method by which the learning unit 235 creates the feature sound estimation model will be described later in detail.

The learning unit 235 also generates a model for estimating the notification sound (hereinafter referred to as the notification sound estimation model). The notification sound estimation model is a model for estimating the sound to be notified to the user from the sound collected by the microphone as the sensor 10 . The method by which the learning unit 235 creates the notification sound estimation model will be described later in detail.

FIG. 4 is a diagram showing an example of stationary sound information 220 in the embodiment. The stationary sound information 220 is information about stationary sounds. A steady sound is a sound that is constantly generated in a space regardless of whether or not the user is present in the space. For example, if the space is a kitchen and the ventilation fan provided in the kitchen is always rotating, the rotating sound becomes the stationary sound. The steady sound information 220 includes, for example, items such as a steady sound identifier (shown as steady sound No in FIG. 4) and frequency characteristics. A stationary sound identifier is identification information such as a number that uniquely identifies a stationary sound. The frequency characteristic is information indicating the frequency characteristic of the stationary sound identified by the stationary sound identifier. The frequency characteristic is, for example, information indicating the magnitude (gain) of the sound component included in the stationary sound for each frequency band.

FIG. 5 is a diagram showing an example of the notification sound information 221 in the embodiment. The notification sound information 221 is information regarding notification sounds. The notification sound is a sound to be notified to the user. The notification sound is, for example, a ringing sound that sounds when the intercom is operated. The notification sound information 221 includes, for example, items such as a notification sound identifier (indicated as notification sound No in FIG. 5) and frequency characteristics. The notification sound identifier is identification information such as a number that uniquely identifies the notification sound. The frequency characteristic is information indicating the frequency characteristic of the notification sound specified by the notification sound identifier.

Also, the notification sound information 221 may be generated for each user. For example, when a parent and a child live in a house H, for example, the child's voice is the notification sound for the parent who is the user.

6 and 7 are diagrams showing examples of the installation information 222 in the embodiment. The installation information 222 is information indicating the installation positions of the sensor 10 and the speaker 30 installed in the house H, respectively. FIG. 6 shows an example of installation information 222A indicating the installation position of the sensor 10. As shown in FIG. FIG. 7 shows an example of installation information 222B indicating the installation position of the speaker 30. As shown in FIG.

The installation information 222A shown in FIG. 6 includes, for example, items such as a sensor identifier (indicated as sensor No in FIG. 6) and installation position. The sensor identifier is identification information such as a number that uniquely identifies the sensor 10 . The installation position is information indicating the installation position of the sensor 10 specified by the sensor identifier.

The installation information 222B shown in FIG. 7 includes, for example, items such as a speaker identifier (shown as speaker No in FIG. 7) and installation position. The speaker identifier is identification information such as a number that uniquely identifies the speaker 30 . The installation position is information indicating the installation position of the speaker 30 specified by the speaker identifier.

(Method for locating the user)
Here, a method for specifying the position of the user by the specifying unit 231 will be described with reference to FIGS. 8 and 9. FIG. 8 and 9 are diagrams for explaining the processing performed by the identifying unit 231 in the embodiment.

As shown in FIG. 8, the specifying unit 231 includes, for example, a plurality of stationary

sound reducing units

2310, 2311, 2312, a characteristic sound extracting unit 2313, and a position specifying unit 2314.

Information detected by each of the plurality of sensors 10 is input to each of the plurality of stationary

sound reduction units

2310 , 2311 , and 2312 . In the following description, when the plurality of stationary

sound reduction units

2310, 2311, and 2312 are not distinguished from each other, they are referred to as a stationary sound reduction unit 231N.

The steady sound reduction unit 231N reduces the steady sound from the sound detected by the sensor 10. The stationary sound reduction unit 231N, for example, periodically acquires the sound detected by the sensor 10, performs a Fourier transform on the acquired signal indicating changes in the sound in time series, and obtains the sensor sound indicating the frequency characteristics of the signal. Generate frequency response. The stationary sound reduction unit 231N refers to the stationary sound information 220, acquires the frequency characteristics of the stationary sound, and subtracts the acquired frequency characteristics of the stationary sound from the sensor sound frequency characteristics.

Alternatively, the steady sound reduction unit 231N may filter the sound detected by the sensor 10 to reduce the steady sound. The filter here is a filter having characteristics that reduce the frequency band corresponding to stationary sound. In this case, for example, the stationary sound information 220 includes items such as filter characteristics. The filter characteristic indicates the characteristic of the filter that reduces the frequency component corresponding to the stationary sound. Filter characteristics are, for example, information indicating the filter configuration and coefficients when the filter is a digital filter. The stationary sound reduction unit 231N refers to the stationary sound information 220, acquires filter characteristics for reducing stationary sounds, and generates a filter for reducing stationary sounds based on the acquired filter characteristics. The steady sound reduction unit 231N reduces the steady sound from the sound detected by the sensor 10 by applying the generated filter to the sound detected by the sensor 10 .

The steady sound reduction unit 231N outputs the sound obtained by reducing the steady sound from the sound detected by the sensor 10 to the characteristic sound extraction unit 2313 and the position specifying unit 2314.

The characteristic sound extraction unit 2313 acquires the sound with the stationary sound reduced from the stationary sound reduction unit 231N. The characteristic sound extraction unit 2313 determines whether or not the sound acquired from the stationary sound reduction unit 231N includes a characteristic sound, and outputs the determination result. The characteristic sound extraction unit 2313 uses, for example, a characteristic sound estimation model to determine whether or not the sound acquired from the stationary sound reduction unit 231N includes a characteristic sound.

The characteristic sound extraction unit 2313 inputs the sound obtained from the stationary sound reduction unit 231N to the characteristic sound estimation model, thereby obtaining the result obtained from the characteristic sound estimation model. The result obtained from the feature sound estimation model is the result of estimating whether or not the sound input to the model contains the feature sound. The characteristic sound extraction unit 2313 outputs the result obtained from the characteristic sound estimation model to the position specifying unit 2314 .

The position specifying unit 2314 acquires the sound with the stationary sound reduced from the stationary sound reducing unit 231N. Also, the position specifying unit 2314 acquires information indicating whether or not the sound includes a characteristic sound from the characteristic sound extracting unit 2313 . If the sound includes a characteristic sound, the position specifying unit 2314 estimates the position and movement of the user based on the sound acquired from each of the steady sound reduction units 231N.

The process of estimating the user's position and movement by the position specifying unit 2314 will be described with reference to FIG. FIG. 9 schematically shows four microphones provided as sensors 10-1 to 10-4 at the four corners of the space where the house H is located. In addition, FIG. 9 schematically shows that the user U# moves from the position of the user U# to the position of the user U in the center of the space.

In the example of this figure, when the user U# moves from the position of the user U# to the position of the user U, the characteristic sound (the user's moving sound) acquired by the microphones of the sensors 10-1 and 10-2 gradually decreases. Become. On the other hand, the characteristic sound (the user's moving sound) acquired by the microphones of the sensors 10-3 and 10-4 gradually increases. The position specifying unit 2314 detects the position and movement of the user based on such changes in the loudness of the characteristic sound.

Here, the stationary sound will be explained with reference to FIG. Stationary sound is, for example, sound (such as noise) that is constantly occurring at a certain position. A stationary sound is detected by averaging the frequency characteristics of sounds collected for a certain period of time (for example, one hour) by a microphone provided at that position. Stationary sounds that are detected in this way include, for example, sounds that have the same frequency and are continuously output for a certain period of time, the roar of a ventilation fan, and natural sounds such as flowing water. The example of FIG. 10 shows that a steady sound is output from the sound source SA1, and the frequency characteristic of the steady sound is the characteristic N1.

For example, each of the microphones installed as the sensor 10 in the house H may detect in advance the steady sound of the location where the microphone is installed. Information in which the steady sound detected by each microphone is associated with the location where the sensor 10 is provided may be stored in the storage unit 22 as the steady sound information 220 . In this case, the steady sound reduction unit 231N may reduce the steady sound corresponding to the location where the sensor 10 is installed from the sounds detected by the sensor 10 .

Here, the characteristic sound will be explained with reference to FIG. A characteristic sound is a sound having a characteristic different from that of a steady sound. Specifically, a stationary sound is a sound that has similar frequency components and is continuously output. On the other hand, a characteristic sound is a sound caused by the presence of a user or the like, and is a sound that is suddenly output in a short period of time. For example, the characteristic sound is a sound corresponding to the human voice extracted based on the formant structure (the frequency characteristics of the voice of the uttering person). In the example of FIG. 11, it is shown that a characteristic sound such as voice is output from user SA2, and the frequency characteristic of the characteristic sound is characteristic N2. Further, it is indicated that a characteristic sound such as a cry is output from the animal SA3, and that the frequency characteristic of the characteristic sound is the characteristic N3.

Also, the characteristic sound may include a sine wave that does not exist in nature, such as a touch panel operation sound. For example, if the touch panel operation sound indicates that there is a user operating the touch panel, the touch panel operation sound is the characteristic sound.

Information related to the characteristic sound may be stored in the storage unit 22. In this case, the individual corresponding to the characteristic sound may be identified. Specifically, the harmonic structure (frequency characteristics) of each voice of a plurality of users is detected in advance. Multiple users, etc. may include family members, housemates, and animals such as pets. The detected harmonic structure of each voice is stored in the storage unit 22 as the characteristic sound of each user. Further, when a touch panel or the like is operated only by a specific individual user, the operation sound of the touch panel is stored in the storage unit 22 as the characteristic sound of the specific individual user. In this case, the characteristic sound extraction unit 2313 determines whether or not the sound acquired from the stationary sound reduction unit 231N includes the characteristic sound based on the frequency characteristics of the characteristic sounds stored in the storage unit 22 .

Here, the action of this embodiment will be described with reference to FIGS. 12 and 13. FIG. 12 and 13 are diagrams for explaining the processing performed by the control device 20 of the embodiment. As shown in FIG. 12, it is assumed that microphones (sensors 10-1 to 10-4) as sensors 10 are installed at four positions in a house H, respectively. It is also assumed that speakers 30 (speakers 30-1 to 30-4) are installed near the respective microphones. In this space, a steady sound is output from the sound source SA1. It is also assumed that the user SA2 exists near the sensor 10-1. It is assumed that an animal SA3 exists in the vicinity of the sensor 10-3. It is assumed that user SA4 exists near sensor 10-4.

Here, as indicated by the arrow J in FIG. 13, a case will be described in which the cry of the animal SA3 is detected as a notification sound and notified to the user SA2. The stationary sound and the voice of the user SA2 are detected by the microphone of the sensor 10-1. Only stationary sounds are detected by the microphone of the sensor 10-2. The microphone of the sensor 10-3 detects the stationary sound and the cry of the animal SA3. The stationary sound and the voice of the user SA4 are detected by the microphone of the sensor 10-4.

Specifically, as shown in FIG. 13, the microphone of the sensor 10-1 detects a sound exhibiting a frequency characteristic T1 obtained by synthesizing the stationary sound and the voice of the user SA2. The microphone of the sensor 10-2 detects the sound showing the frequency characteristic T2 of only the stationary sound. The microphone of the sensor 10-3 detects a sound exhibiting a frequency characteristic T3 in which the stationary sound and the cry of the animal SA3 are combined. The microphone of the sensor 10-4 detects a sound exhibiting a frequency characteristic T4 in which the stationary sound and the user SA4 are synthesized.

The steady sound reduction unit 231N reduces the steady sound from the sounds detected by the sensors 10-1 to 10-4. As a result, only the voice of user SA2 is extracted from the sound detected by the microphone of sensor 10-1. Nothing is extracted from the sound detected by the microphone of sensor 10-2. Only the cry of the animal SA3 is extracted from the sounds detected by the microphone of the sensor 10-3. Only the voice of user SA4 is extracted from the sound detected by the microphone of sensor 10-4.

A characteristic sound is extracted from the sounds extracted from each of the sensors 10-1 to 10-4 by the characteristic sound extraction unit 2313. Specifically, the voice of the user SA2 is extracted as the characteristic sound from the sound detected by the microphone of the sensor 10-1. No characteristic sound is extracted from the sound detected by the microphone of the sensor 10-2. The bark of the animal SA3 is extracted as a characteristic sound from the sound detected by the microphone of the sensor 10-3. The voice of the user SA4 is extracted as a characteristic sound from the sound detected by the microphone of the sensor 10-4.

The position specifying unit 2314 specifies that the user SA2 exists at the position of the sensor 10-1 (near the position of the sensor 10-1). Position specifying unit 2314 specifies that animal SA3 exists at the position of sensor 10-3 (near the position of sensor 10-3). Position specifying unit 2314 specifies that user SA4 exists at the position of sensor 10-4 (near the position of sensor 10-4).

The selection unit 232 selects the speaker 30-1 installed near the location where the user to be notified, that is, the user SA2 exists. The output unit 233 causes the speaker 30-1 selected by the selection unit 232 to output the cry of the animal SA3 extracted from the microphone of the sensor 10-3.

As shown in the example of FIG. 13, when the level of the stationary sound is high, if the stationary sound is reduced by the stationary sound reduction unit 231N, the level of the sound after the reduction of the stationary sound becomes low. It may be difficult to extract feature sounds from the reduced sound. As a countermeasure, the steady sound reduction unit 231N outputs the sound detected by the microphone as it is to the characteristic sound extraction unit 2313 without reducing the steady sound when the level of the steady sound is greater than a predetermined threshold. may

When the feature sound extraction unit 2313 extracts the feature sound from the sound detected by the microphone using the feature sound estimation model, for example, the feature sound is different depending on whether the stationary sound is reduced or not. Try to use estimation models. For example, when the stationary sound has been reduced, the feature sound extraction unit 2313 uses a model for estimating the feature sound from the sound from which the stationary sound has been reduced. On the other hand, when the stationary sound is not reduced, the characteristic sound extraction unit 2313 uses a model for estimating the characteristic sound from the sound for which the stationary sound is not reduced.

FIG. 14 is a flowchart explaining the flow of processing performed by the control device 20 of the embodiment. The control device 20 acquires information (sensor information) acquired by the sensor 10 (step S1). When the sensor information includes sound, the control device 20 determines whether or not the sound is a notification sound (step S2). When the sensor 10 is a microphone, or when some of the plurality of sensors 10 are microphones, the control device 20 combines the sound collected by the microphone with the sound stored in the storage unit 22 as the notification sound information 221. It is determined whether or not the sound is a notification sound by a process such as comparing the frequency characteristics of the .

When determining that the sound is an information sound, the control device 20 extracts all sensor information used to identify the user's position (step S3). For example, when specifying the user's position using an image, the control device 20 extracts information of the image acquired by the image sensor. Alternatively, when the user's position is specified using the temperature, the control device 20 extracts information indicating the temperature acquired by the infrared sensor and the temperature sensor. When identifying the position of the user using sound, the control device 20 extracts information indicating the sound collected by the microphone. In the following flow, a case will be described in which the control device 20 identifies the position of the user using sound.

The control device 20 reduces stationary sounds from the sounds collected by the microphone (step S4). The control device 20 extracts a characteristic sound from the sound from which the stationary sound has been reduced (step S5). The control device 20 identifies the position of the user from the extracted characteristic sounds (step S6). The control device 20 selects the speaker 30 for outputting the notification sound based on the identified position of the user (step S7). The control device 20 outputs a notification sound from the selected speaker 30 (step S8).

On the other hand, if it is determined in step S2 that it is not the notification sound, the control device 20 ends the process.

A method for the learning unit 235 to generate a learned model (position estimation model, characteristic sound estimation model, and notification sound estimation model) will be described with reference to FIG. FIG. 15 is a flowchart illustrating the flow of processing for generating a machine learning model according to the embodiment.

(Position estimation model)
First, the position estimation model will be explained. A location estimation model is a model that estimates a user's location. Here, a case where the position estimation model estimates whether or not the sound collected by the microphone is the sound caused by the user will be described as an example.

The learning unit 235 acquires sensor information for machine learning, here information on sounds collected by a microphone (step S11). The learning unit 235 generates a learning data set (step S12). The learning data set here is information in which a label indicating whether or not the sound is caused by the user is added to the sound collected by the microphone. The learning unit 235 determines whether or not all sensor information for machine learning has been acquired (step S13). If all the sensor information for machine learning has not been acquired, the learning unit 235 returns to step S1.

When all sensor information for machine learning is acquired, the learning unit 235 generates a position estimation model (learned model) (step S14). The learning unit 235 generates a position estimation model (learned model) by having a machine learning model such as a CNN (Convolutional Neural Network) learn the learning data set. When the learning unit 235 inputs a sound collected by a microphone out of the learning data set to a machine learning model such as CNN, the value output from the machine learning model is a label attached to the sound (attributed to the user). The machine learning model is repeatedly trained on the training data set while adjusting the parameters of the machine learning model so as to approach the label indicating whether or not it is a sound that sounds. This makes it possible to generate a model capable of accurately estimating whether or not the sound collected by the microphone is the sound caused by the user. The learning unit 235 stores the generated position estimation model (learned model) in the storage unit 22 as the learned model information 223 (step S15).

(Feature sound estimation model)
Next, the feature sound estimation model will be described. A feature sound estimation model is a model for estimating a feature sound. Here, a case where the characteristic sound estimation model estimates whether or not the sound collected by the microphone is the characteristic sound will be described as an example.

The learning unit 235 acquires sensor information for machine learning, here information on sounds collected by a microphone (step S11). The learning unit 235 generates a learning data set (step S12). The learning data set here is information in which a label indicating whether or not the sound is a characteristic sound is attached to the sound collected by the microphone. The learning unit 235 determines whether or not all sensor information for machine learning has been acquired (step S13). If all the sensor information for machine learning has not been acquired, the learning unit 235 returns to step S1.

When all sensor information for machine learning is acquired, the learning unit 235 generates a feature sound estimation model (learned model) (step S14). The learning unit 235 generates a feature sound estimation model (learned model) by having a machine learning model such as CNN learn the learning data set. When the learning unit 235 inputs a sound collected by a microphone out of the learning data set to a machine learning model such as CNN, the value output from the machine learning model is a label attached to the sound (characteristic sound or not). The machine learning model is trained on the iterative learning data set while adjusting the parameters of the machine learning model so as to approach the label indicating whether or not it is. This makes it possible to generate a model capable of accurately estimating whether a sound collected by a microphone is a characteristic sound or not. The learning unit 235 stores the generated feature sound estimation model (learned model) in the storage unit 22 as the learned model information 223 (step S15).

(Notification sound estimation model)
Next, the notification sound estimation model will be described. The notification sound estimation model is a model for estimating the notification sound. Here, an example will be described in which the notification sound estimation model estimates whether or not the sound collected by the microphone is the notification sound.

The learning unit 235 acquires sensor information for machine learning, here information on sounds collected by a microphone (step S11). The learning unit 235 generates a learning data set (step S12). The learning data set here is information in which a label indicating whether or not the sound is a notification sound is attached to the sound collected by the microphone. The learning unit 235 determines whether or not all sensor information for machine learning has been acquired (step S13). If all the sensor information for machine learning has not been acquired, the learning unit 235 returns to step S1.

When all the sensor information for machine learning is acquired, the learning unit 235 generates a notification sound estimation model (learned model) (step S14). The learning unit 235 generates a notification sound estimation model (learned model) by causing a machine learning model such as CNN to perform machine learning on the learning data set. When the learning unit 235 inputs a sound collected by a microphone in the learning data set to a machine learning model such as CNN, the value output from the machine learning model is a label attached to the sound (notification sound or not). While adjusting the parameters of the machine learning model so as to approach the label indicating whether or not, the machine learning model is machine-learned on the iterative learning data set. This makes it possible to generate a model capable of accurately estimating whether or not the sound collected by the microphone is the notification sound. The learning unit 235 stores the generated notification sound estimation model (learned model) in the storage unit 22 as the learned model information 223 (step S15).

As described above, the control device 20 according to the embodiment controls sounds to be output from each of the plurality of speakers 30 provided in the space. The control device 20 includes an acquisition unit 230 , an identification unit 231 , a selection unit 232 and an output unit 233 . The acquisition unit 230 acquires sensor information (detection results) detected by the plurality of sensors 10 provided in the space. The specifying unit 231 specifies the position of the user existing in the space based on the detection result acquired by the acquiring unit 230 . The selection unit 232 selects the speaker 30 installed near the position of the user identified by the identification unit 231 from the plurality of speakers 30 . The selection unit 232 may select the speaker 30 installed closest to the user's position from among the plurality of speakers 30 . The output unit 233 outputs sound from the speaker selected by the selection unit 232 .

In the control device 20 according to the embodiment, the specifying unit 231 specifies the user's position based on the output obtained by inputting the detection result into the position estimation model. The position estimation model uses a learning data set in which a label indicating whether or not the detection result was detected due to the presence of the user is attached to the detection result, and the detection result and the presence of the user are used. It is a trained model that has learned the correspondence between whether the detection result is detected or not due to the fact that the detection result is detected.

In the control device 20 according to the embodiment, the sensor 10 may be a microphone. The specifying unit 231 specifies the user position based on the output obtained by inputting the detection result into the position estimation model. The position estimation model is a model generated by performing learning using a learning data set in which a label indicating whether or not the sound for machine learning is caused by the presence of the user is attached. , is a model for estimating whether or not the sound detected by the microphone is caused by the presence of the user.

In the control device 20 according to the embodiment, the sensor 10 may be a microphone. When a sound having characteristics different from the steady sound (predetermined steady sound) stored in the storage unit 22 as the steady sound information 220 is collected by the microphone, the identifying unit 231 detects the user's voice in the vicinity of the microphone. is present, and the location of the microphone is specified as the location of the user.

In the control device 20 according to the embodiment, the sensor 10 may include a microphone. The specifying unit 231 determines whether or not the sound is a notification sound to notify the user based on the characteristics of the sound collected by the microphone. The output unit 233 causes the speaker 30 selected by the selection unit 232 to output the sound determined to be the notification sound by the identification unit 231 (an example of the determination unit).

In the control device 20 according to the embodiment, the sensor 10 may include a microphone. The identification unit 231 determines whether or not the sound collected by the microphone is the notification sound based on the output obtained by inputting the sound collected by the microphone into the notification sound estimation model. The notification sound estimation model is a model generated by performing learning using a learning data set in which a machine learning sound is labeled to indicate whether or not the sound is a notification sound to be notified to the user. This is a model for estimating whether or not a sound detected by a microphone is a notification sound.

(Modification 1 of Embodiment)
Modification 1 of the embodiment will now be described. This modification differs from the above-described embodiment in that the notification sound estimation model performs additional learning. FIG. 16 is a flowchart for explaining the flow of processing for performing additional learning according to Modification 1 of the embodiment.

The control device 20 acquires sensor information for estimation, here information about sound collected by the microphone (step S21). The controller 20 uses the notification sound estimation model (learned model) to estimate whether or not the sound collected by the microphone is the notification sound (step S22). The control device 20 determines whether or not the estimation by the notification sound estimation model is correct (step S23). The control device 20 determines whether or not the estimation by the notification sound estimation model is correct, for example, based on information input by the user operating a keyboard or the like.

If the estimation by the notification sound estimation model is incorrect, the control device 20 generates a learning data set for additional learning (step S24). The learning data set for additional learning is information in which correct labels are attached to sounds that are incorrectly estimated by the notification sound estimation model, indicating whether or not the sounds are notification sounds.

The control device 20 determines whether or not to perform additional learning (step S25). For example, the control device 20 determines to perform additional learning when the number of learning data sets for additional learning generated in step S24 reaches a predetermined number. Alternatively, the control device 20 may determine to perform additional learning when the probability that the notification sound estimation model makes an erroneous estimation is greater than or equal to a predetermined value.

When performing additional learning, the control device 20 performs additional learning by subjecting the learning data set for additional learning to machine learning, and updates the notification sound estimation model (learned model) (step S26). The control device 20 stores the updated notification sound estimation model (learned model) (step S27).

As described above, in the control device 20 according to Modification 1 of the embodiment, the acquisition unit 230 obtains the result of the user's determination as to whether or not the sound output by the output unit 233 is the notification sound. get. A learning unit 235 performs additional learning for the notification sound estimation model based on the determination result acquired by the acquisition unit 230 . As a result, in the control device 20 according to Modification 1 of the embodiment, additional learning can be performed on the notification sound estimation model, and if there is an error in the estimation, the error in the estimation can be corrected. Alternatively, when a new notification sound is added, the added notification sound can be machine-learned by the notification sound estimation model.

In the above, the case where additional learning is performed on the notification sound estimation model has been exemplified and explained, but it is not limited to this. The control device 20 can also perform additional learning on the position estimation model and the feature sound estimation model using a similar method.

(Modification 2 of Embodiment)
Modification 2 of the embodiment will now be described. FIG. 17 shows an example in which a plurality of users U1 and U2 are present in a house H. As shown in FIG. This modification differs from the above-described embodiment in that user U1 possesses a calling terminal T. FIG. The transmission terminal T is a terminal device that transmits a signal indicating the presence of the user U1, and is, for example, a smart phone, a beacon terminal, or the like. Acquisition unit 230 acquires a signal (position signal) transmitted from transmitting terminal T. FIG. The specifying unit 231 specifies the position of the user U1 (originating terminal) based on the signal (position signal) acquired by the acquiring unit 230 .

As described above, in the control device 20 according to Modification 2 of the embodiment, a plurality of users U1 and U2 may exist in the space. At least one user U1 among the plurality of users U1 and U2 possesses a transmission terminal T that transmits a position signal indicating the position of user U1. Acquisition unit 230 acquires a signal (position signal) transmitted from transmitting terminal T. FIG. The specifying unit 231 specifies the position of the user U1 based on the signal (position signal) acquired by the acquiring unit 230 . As a result, in the control device 20 according to the second modification of the embodiment, the position of the user U1 possessing the originating terminal T can be accurately specified based on the signal transmitted from the originating terminal T. can.

(Modification 3 of Embodiment)
Here, Modification 3 of the embodiment will be described. In this modified example, it is assumed that the user listens to a sound that continues for a predetermined period of time, such as music. This modification differs from the above-described embodiment in that when the user moves in the space, sound is output following the movement.

The identification unit 231 (an example of the determination unit) determines the sound to be output following the movement of the user (speaker 30 to be output following the movement of the user) based on the characteristics of the sound collected by the microphone. to be changed).
If the sound collected by the microphone is the sound to be output following the movement of the user, the identification unit 231 identifies the user near the microphone that collected the sound. For example, the identification unit 231 identifies the user near the microphone based on the characteristic sound collected together with the sound to be output following the movement of the user. The position of the user is repeatedly specified while the sound to be output following the movement of the user is acquired by the microphone.
When the position of the user identified last time is different from the position of the user identified this time, the selection unit 232 selects the speaker 30 installed near the position identified this time.
The output unit 233 stops the sound output from the speaker 30 selected last time by the selection unit 232, and causes the speaker 30 selected this time to output the sound following the movement of the user.

As described above, in the control device 20 according to Modification 3 of the embodiment, the sensor 10 may include a microphone. The specifying unit 231 determines whether or not the sound collected by the microphone is the sound to be output following the movement of the user. The identifying unit 231 identifies a user near the microphone that has collected the sound determined to be the sound to be output following the movement of the user. The specifying unit 231 associates the specified position of the user with the sound to be output following the movement of the user, and repeatedly specifies the position of the user. When the position of the user identified last time is different from the position of the user identified this time, the selection unit 232 selects the speaker 30 installed near the position of the user identified this time. The output unit 233 stops the sound output from the speaker 30 selected last time by the selection unit 232, and outputs the sound to follow the movement of the user from the speaker 30 selected this time. As a result, in the control device 20 according to the third modification of the embodiment, when the user is listening to music or the like, even if the user moves from the installation position of the speaker 30 that outputs the sound, Sound can be output from the speaker 30 at the destination of movement, and the speaker 30 from which the sound is output can be changed following the movement. This allows the user to continue listening to sounds while moving in space.

All or part of the speaker system 1 and the control device 20 in the above-described embodiment may be realized by a computer. In that case, a program for realizing this function may be recorded in a computer-readable recording medium, and the program recorded in this recording medium may be read into a computer system and executed. It should be noted that the “computer system” here includes hardware such as an OS and peripheral devices. The term "computer-readable recording medium" refers to portable media such as flexible discs, magneto-optical discs, ROMs and CD-ROMs, and storage devices such as hard discs incorporated in computer systems. Furthermore, "computer-readable recording medium" means a medium that dynamically retains a program for a short period of time, like a communication line when transmitting a program via a network such as the Internet or a communication line such as a telephone line. It may also include something that holds the program for a certain period of time, such as a volatile memory inside a computer system that serves as a server or client in that case. Further, the program may be for realizing a part of the functions described above, or may be capable of realizing the functions described above in combination with a program already recorded in the computer system. It may be implemented using a programmable logic device such as FPGA.

Although several embodiments of the invention have been described, these embodiments are presented as examples and are not intended to limit the scope of the invention. These embodiments can be implemented in various other forms, and various omissions, replacements, and modifications can be made without departing from the scope of the invention. These embodiments and their modifications are included in the scope and spirit of the invention, as well as the scope of the invention described in the claims and equivalents thereof.

The present disclosure may be applied to control methods, control devices, and programs.

Reference Signs List 1 speaker system 10 sensor 20 control device 230 acquisition unit 231 identification unit (determination unit)
232... Selection unit 233... Output unit 234... Device control unit 235... Learning unit 30... Speaker

Claims

A computer-implemented control method comprising:
Acquiring a detection result detected by at least one sensor provided in the space,
Identifying the position of the user in the space based on the detection result,
Selecting a speaker installed near the identified position of the user from a plurality of speakers installed in the space;
causing the selected speaker to output sound;
control method including
Identifying the location of the user includes identifying the location of the user based on an output obtained by inputting the sensing result into a location estimation model;
The location estimation model is generated by performing learning using a learning data set including detection results for machine learning labeled as to whether or not the user was detected due to the presence of the user. is a model,
The control method according to claim 1.
acquiring a position signal indicating the position of the second user from a transmission terminal possessed by the second user existing in the space;
determining the location of the second user based on the location signal;
3. The control method according to claim 1 or 2, further comprising:
locating the user includes locating the user based on an output obtained by inputting sound sensed by the sensor into a location estimation model;
The position estimation model is a model generated by performing learning using a training data set containing sounds labeled as to whether or not the sounds are caused by the presence of the user.
The control method according to any one of claims 1 to 3.
Identifying the position of the user includes identifying the installation position of the sensor as the position of the user when a sound having characteristics different from a predetermined steady sound is collected by the sensor.
The control method according to any one of claims 1 to 4.
Further comprising determining whether the sound collected by the sensor is a notification sound to be notified to the user based on the characteristics of the sound collected by the sensor,
Outputting the sound from the selected speaker means that, when the sound collected by the sensor is determined to be the notification sound, the sound collected by the sensor is output to the selected speaker. including outputting
The control method according to any one of claims 1 to 5.
Based on the output obtained by inputting the sound collected by the sensor into the notification sound estimation model, it is determined whether or not the sound collected by the sensor is the notification sound to be notified to the user. ,
The notification sound estimation model is a model generated by performing learning using a learning data set containing sounds for machine learning labeled to indicate whether they are notification sounds.
The control method according to any one of claims 1 to 6.
Acquiring a determination result by the user as to whether or not the output sound is the notification sound;
performing additional learning on the notification sound estimation model based on the determination result;
8. The control method of claim 7, further comprising:
setting at least one of the plurality of speakers other than the selected speaker to standby mode;
9. The control method according to any one of claims 1 to 8, further comprising:
Determining whether or not the sound collected by the sensor is a sound to be output following the movement of the user based on the characteristics of the sound collected by the sensor;
Identifying the position of the user includes repeating the identification of the position of the user when it is determined that the sound collected by the sensor is the sound to be output following the movement of the user,
Selecting the speaker means that, when the position of the user identified last time and the position of the user identified this time are different, a speaker installed near the position of the user identified this time is selected from the plurality of speakers. including selecting from
Outputting the sound from the selected speaker includes outputting the sound that follows the movement of the user from the speaker installed near the user's position identified this time,
The control method according to any one of claims 1 to 9.
stopping the output of the sound to follow the movement of the user from the speaker installed near the position of the user identified last time;
11. The control method of claim 10, further comprising:
an acquisition unit that acquires a detection result detected by at least one sensor provided in the space;
a specifying unit that specifies the position of the user in the space based on the detection result;
a selection unit that selects a speaker installed near the identified position of the user from a plurality of speakers installed in the space;
an output unit that outputs sound to the selected speaker;
A control device comprising:
to the computer,
Acquiring a detection result detected by at least one sensor provided in the space;
Identifying the user's position in the space based on the detection result;
selecting a speaker installed near the identified position of the user from a plurality of speakers installed in the space;
causing the selected speaker to output sound;
program to run.