WO2021230067A1

WO2021230067A1 - Information processing device and information processing method

Info

Publication number: WO2021230067A1
Application number: PCT/JP2021/016706
Authority: WO
Inventors: 慧高橋
Original assignee: ソニーグループ株式会社
Priority date: 2020-05-11
Filing date: 2021-04-27
Publication date: 2021-11-18

Abstract

The present technology relates to an information processing device and an information processing method for making it possible to identify a vibration generated by an operation by a user. The information processing device of the present technology is provided with two terminals equipped with sensors for detecting a vibration representative of an operation by a user and a vibration including noise, wherein one terminal comprises a reception unit for receiving a result of detection by a first sensor with which the other terminal is equipped, and a determination unit for determining whether, on the basis of the result of detection received by the reception unit, the result of detection by a second sensor with which the one terminal is equipped is noise. The present technology may be applied, for example, to true wireless earbuds.

Description

Information processing equipment and information processing method

The present technology relates to an information processing apparatus and an information processing method, and more particularly to an information processing apparatus and an information processing method capable of identifying vibrations generated by a user's operation.

In recent years, users can operate earphones by various operation methods. For example, Patent Document 1 describes a small terminal that a user can operate by gesture, voice, and button operation.

Japanese Unexamined Patent Publication No. 2017-207890

When the accelerometer is mounted on the earphone, the vibration generated by the user tapping around the earphone can be detected. The earphone recognizes the user's operation by detecting the vibration by the acceleration sensor.

By the way, the vibration generated by the user tapping around the earphone may be similar to the noise such as the vibration caused by the user walking or chewing, or the vibration of the sound output from the earphone. As a result, when the user's operation is recognized by the acceleration sensor mounted on the earphone detecting the vibration, it is difficult to distinguish between the vibration and the noise generated by the user's operation.

This technology was made in view of such a situation, and makes it possible to identify the vibration generated by the user's operation.

The information processing apparatus of the first aspect of the present technology includes two terminals equipped with sensors for detecting vibration including vibration and noise representing a user operation, and one terminal is the other terminal. Based on the receiving unit that receives the detection result by the first sensor mounted on the receiver and the detection result received by the receiving unit, the detection result by the second sensor mounted on one of the terminals is obtained. It is an information processing device having a determination unit for determining whether or not it is noise.

The information processing method of the first aspect of the present technology is an information processing apparatus including two terminals equipped with sensors for detecting vibration, wherein one terminal is mounted on the other terminal. An information processing method for receiving the detection result of the first sensor and determining whether or not the detection result of the second sensor mounted on one of the terminals is noise based on the received detection result. be.

The information processing device on the second aspect of the present technology includes a sensor unit that detects vibration including vibration and noise that represent a user operation, a detection result by the sensor unit, and a prediction result of noise detected by the sensor unit. Based on the above, it is an information processing device including a recognition unit that recognizes a user's operation.

In the first aspect of the present technology, one of the two terminals equipped with a sensor for detecting vibration including vibration and noise representing a user's operation is mounted on the other terminal. The detection result by the first sensor is received, and based on the received detection result, it is determined whether or not the detection result by the second sensor mounted on one of the terminals is noise. ..

The second aspect of the present technology is based on a sensor unit that detects vibration including vibration and noise that represent a user's operation, a detection result by the sensor unit, and a prediction result of noise detected by the sensor unit. And recognize the user's operation.

It is a figure which shows the example of the appearance of the earphone which concerns on one Embodiment of this technique. It is a block diagram which shows the hardware configuration example of an earphone. It is a block diagram which shows the functional structure example of an earphone. It is a figure which shows the example of the reproduction timing of a sound effect. It is a figure which shows the flow of information in the 1st method. It is a flowchart explaining the flow of the process executed by an earphone. It is a figure which shows the flow of information in the 2nd method. It is a flowchart explaining the flow of the process executed by an earphone. It is a figure which shows the example of the vibration by the operation other than the tap operation. It is a figure which shows the other example of the vibration by the operation other than the tap operation. It is a figure which shows the example of how to use an earphone. It is a block diagram which shows the functional structure example of the earphone in 1st example. It is a figure which shows the example of the waveform of the vibration generated by the landing during walking. It is a figure which shows the example of the waveform of the sensor data of IMU when the vibration generated by the landing during walking is detected. It is a figure which shows the waveform of the sensor data of IMU when the vibration generated by the tap operation performed after walking is detected. It is a flowchart explaining the flow of the process executed by an information processing unit. It is a sequence diagram explaining the flow of a noise learning process and a gesture recognition process executed by an information processing unit. It is a block diagram which shows the functional structure example of the earphone in the 2nd example. It is a figure which shows the example of the structure of an earphone schematically. It is a figure which shows the example of the waveform of the music reproduction signal. It is a flowchart explaining the flow of the process executed by an information processing unit. It is a flowchart explaining the flow of the process executed by an information processing unit. It is a block diagram which shows the other hardware configuration example of an earphone. It is a block diagram which shows the functional structure example of an earphone. It is a figure which shows the example of the main body tap and the face tap. It is a figure which shows the distribution of the tap strength of a main body tap and a face tap. It is a flowchart explaining the flow of the process executed by an earphone.

Hereinafter, a mode for implementing the present technology will be described. The explanation will be given in the following order.
1. 1. First embodiment 2. Second embodiment 3. Third embodiment 4. Modification example

<1. First Embodiment>
Appearance of Earphones FIG. 1 is a diagram showing an example of the appearance of earphones (inner ear headphones) 10 according to an embodiment of the present technology.

The earphone 10 is an acoustic output device that is attached to the user's ear to appreciate the sound output from the built-in driver.

In FIG. 1, the earphone 10 has a left ear terminal 10L and a right ear terminal 10R. The earphone 10 is a left-right independent earphone in which the left ear terminal 10L and the right ear terminal 10R are not physically connected. The left ear terminal 10L and the right ear terminal 10R are connected via a wireless communication path such as NFMI (Near Field Magnetic Induction).

Each of the left ear terminal 10L and the right ear terminal 10R is equipped with a processing device such as a CPU (Central Processing Unit), an acceleration sensor, and a sound output device.

The user can operate the earphone 10 by tapping around the ear where the earphone 10 is attached. Normally, the operation of the earphone 10 by tapping around the ear is performed on either the left ear side or the right ear side at a certain timing. The left ear terminal 10L and the right ear terminal 10R detect the vibration of the tap by the acceleration sensor and output a sound effect to notify the user that the tap has been detected. Since the operation uses vibration detection, the earphone 10 is naturally operated not only by tapping around the ear but also by tapping the main body of the earphone 10.

The earphone 10 is an example of an information processing device to which this technology is applied. The earphone 10 may be configured as a true wireless earphone.

-Earphone configuration example FIG. 2 is a block diagram showing a hardware configuration example of the earphone 10.

As shown in FIG. 2, the left ear terminal 10L has a CPU 101L, a ROM (Read Only Memory) 102L, a RAM (Random Access Memory) 103L, a bus 104L, an input / output I / F unit 105L, a sound output unit 106L, and a sensor unit. It includes 107L, a communication unit 108L, a storage unit 109L, and a power supply unit 110L.

The CPU 101L, ROM 102L, and RAM 103L are connected to each other by the bus 104L.

The input / output I / F unit 105L is further connected to the bus 104L. A sound output unit 106L, a sensor unit 107L, a communication unit 108L, and a storage unit 109L are connected to the input / output I / F unit 105L.

The sound output unit 106L reproduces music data acquired from an external music playback device, for example, and outputs sound. Further, the sound output unit 106L outputs a sound effect indicating that the operation has been detected.

The sensor unit 107L is composed of an IMU (Inertial Measurement Unit) 121L.

IMU121L is composed of an acceleration sensor, a gyro sensor, etc. The IMU121L detects the acceleration, angular acceleration, etc. of the left ear terminal 10L and outputs it as sensor data.

The communication unit 108L is composed of a left / right communication unit 131L and an external communication unit 132L.

The left and right communication unit 131L is configured as a communication module that supports short-range wireless communication such as NFMI. The left-right communication unit 131L communicates with the right ear terminal 10R, and exchanges music data, sensor data, and the like.

The external communication unit 132L is configured as a communication module compatible with wireless communication such as Bluetooth (registered trademark), wireless LAN (Local Area Network), cellular communication (for example, LTE-Advanced, 5G, etc.), or wired communication. .. The external communication unit 132L communicates with an external device and exchanges sound signals, sensor data, and the like. The external device here includes, for example, a smartphone, a tablet terminal, a personal computer, a server, a music playback device, and the like. Servers include servers provided by music distribution services that distribute music over the Internet.

The storage unit 109L is composed of, for example, a non-volatile memory, a semiconductor memory including a volatile memory, and the like. Music data or the like acquired from an external device is recorded in the storage unit 109L.

The power supply unit 110L has a battery. The power supply unit 110L supplies power to each unit of the left ear terminal 10L.

The right ear terminal 10R has the same configuration as the left ear terminal 10L. In the right ear terminal 10R, the blocks corresponding to each configuration of the left ear terminal 10L shall be indicated by a reference numeral having the same number followed by "R", and duplicate description will be omitted. Further, when it is not necessary to distinguish which of the left ear terminal 10L and the right ear terminal 10R is provided, "L" or "R" will be omitted.

-Functional configuration example of earphone FIG. 3 is a block diagram showing a functional configuration example of the earphone 10.

As shown in FIG. 3, in the earphone 10, the information processing unit 141 is realized by executing a predetermined program by the CPU 101 of FIG.

The configuration shown in FIG. 3 is provided for each of the left ear terminal 10L and the right ear terminal 10R. The other terminal, which will be described later, is the right ear terminal 10R when the information processing unit 141 of FIG. 3 has the configuration of the left ear terminal 10L, and the left ear terminal when the information processing unit 141 has the configuration of the right ear terminal 10R. It becomes 10L. Hereinafter, the configuration provided in the left ear terminal 10L will be described by adding "L" to the same reference numeral, and the configuration provided in the right ear terminal 10R will be described by adding "R" to the same reference numeral. The same applies to FIGS. 12, 18, and 24, which will be described later.

The information processing unit 141 is composed of a one-ear tap detection unit 151, a transmission control unit 152, a reception control unit 153, a tap determination unit 154, a sound control unit 155, and an execution unit 156.

The one-ear tap detection unit 151 acquires sensor data from the IMU 121 and detects taps based on the sensor data. Here, the one-ear tap detection unit 151 detects the presence or absence of a tap operation from the vibration generated by the user's operation including the tap operation and other operations. Operations other than tapping include, for example, walking and chewing. The tap operation represents a user's operation (for example, an operation in which the user's fingertip touches the main body of the earphone 10), and the tap is a vibration detected in the earphone 10 as being caused by the user's tap operation. Represents.

When a tap is detected, the one-ear tap detection unit 151 supplies event information indicating that the tap has been detected to the transmission control unit 152. Further, the one-ear tap detection unit 151 supplies the sensor data or event information acquired from the IMU 121 to the tap determination unit 154.

Further, the one-ear tap detection unit 151 supplies the output request of the operation recognition sound to the sound control unit 155. The operation recognition sound is a sound effect indicating that a tap has been detected.

The transmission control unit 152 is supplied with the same information as the sensor data acquired by the one-ear tap detection unit 151 from the IMU 121. The transmission control unit 152 controls the left / right communication unit 131 to transmit the event information supplied from the one-ear tap detection unit 151 and the sensor data supplied from the IMU 121 to the other terminal.

The reception control unit 153 acquires the event information and the sensor data transmitted from the other terminal from the left / right communication unit 131 and supplies them to the tap determination unit 154. The event information transmitted from the other terminal indicates that the tap was detected in the other terminal.

In the tap determination unit 154, an event that the tap is detected in the one ear tap detection unit 151 based on the sensor data supplied from the one ear tap detection unit 151 and the sensor data supplied from the reception control unit 153 is a tap event. Whether it is a tap-like event or not.

The tap event is an event indicating that a tap operation has been detected. The tap-like event is an event indicating that an action that causes vibration is detected, except for the tap action. In other words, the tap determination unit 154 is a determination unit that determines whether or not the vibration detection by the IMU 121 is noise such as the detection of vibration generated by an operation other than the tap operation.

When the tap determination unit 154 determines that the event that the tap is detected in the one-ear tap detection unit 151 is a tap-like event, the tap determination unit 154 supplies a cancellation sound output request to the sound control unit 155. The cancel sound is a sound indicating that the detection of the tap has been canceled.

When the tap determination unit 154 determines that the event that the tap is detected in the one-ear tap detection unit 151 is a tap event, the tap determination unit 154 supplies an output request for the function execution sound to the sound control unit 155. The function execution sound is a sound indicating that the function assigned to the tap event is executed. Further, the tap determination unit 154 controls the execution unit 156 to execute the function assigned to the tap event.

The sound control unit 155 outputs sound effects corresponding to output requests for sound effects such as an operation recognition sound supplied from the one-ear tap detection unit 151, a cancellation sound supplied from the tap determination unit 154, and a function execution sound. Output from 106.

The execution unit 156 executes the function assigned to the tap event according to the control by the tap determination unit 154. For the tap event, for example, processing related to content reproduction such as start of music reproduction and song advance is assigned.

By the way, in order to determine the tap event using the sensor data of the left ear terminal 10L and the right ear terminal 10R, a wireless communication time for transmitting and receiving the sensor data between the two terminals is required. Further, since music data for stereo reproduction is constantly transmitted between both terminals, there is a possibility that sensor data cannot be transmitted.

If it takes a long time to respond to the tap operation due to the time required for wireless communication or the sensor data is not transmitted, the user will be given the impression that the performance of the earphone 10 is low. Therefore, it is desirable that the reaction sound to the tap operation is reproduced quickly. Therefore, it is not desirable in terms of reaction speed that the reaction sound is reproduced after the tap event is determined based on the sensor data of both terminals.

In this technology, tap detection is performed separately on both terminals, and when a tap is detected, the operation recognition sound is immediately reproduced as a reaction sound to the tap operation. This makes it possible to notify the user that the tap operation is immediately reacting.

Further, in the present technology, if it is determined that the event that the tap is detected is a tap-like event before the predetermined timeout period elapses after the operation recognition sound is played, the tap detection is canceled and canceled. The sound is played. This makes it possible to notify the user that the tap has responded but has been cancelled.

FIG. 4 is a diagram showing an example of the reproduction timing of the sound effect.

A in FIG. 4 shows the reproduction timing of the sound effect when the tap operation is performed by the user. As shown in FIG. 4A, the operation recognition sound is reproduced at the timing of time t1, which is the time immediately after the tap is detected.

The function execution sound is played at the timing of time t2, which is the time when a predetermined timeout time has elapsed from time t1. The function assigned to the tap event is executed at the timing of time t2.

B in FIG. 4 shows the reproduction timing of the sound effect when an operation other than the tap operation is performed by the user. As shown in B of FIG. 4, the operation recognition sound is reproduced at the timing of time t1, which is the time immediately after the tap is detected.

When the information used for determining to cancel the detection of the tap transmitted from the other terminal is received at the timing of the time t3, which is the timing between the time t1 and the time t2, the tap detection is canceled. .. For example, event information and sensor data are transmitted from the other terminal as information used for determining whether to cancel the tap.

The cancellation sound is played at the timing of time t4, which is the time between time t3 and time t2. Even if the tap detection is canceled, the cancellation sound may not be played.

As described above, in the earphone 10, when the tap operation is performed, the operation recognition sound is played immediately after the tap is detected, and the function execution sound is played after the predetermined time-out time has elapsed.

The operation recognition sound is always reproduced when a tap is detected on each of the left ear terminal 10L and the right ear terminal 10R. On the other hand, the function execution sound is played only when the information for canceling the tap detection is not transmitted from the other terminal by the time-out time elapses.

-Operation of earphone Here, the operation of the earphone 10 having the above configuration will be described.

A tap is detected in each of the left ear terminal 10L and the right ear terminal 10R, and the information of both terminals is used to determine whether the event that the tap is detected is a tap event or a tap-like event. Is done. Two methods can be considered as a method of exchanging information used for determining whether the event that the tap is detected is a tap event or a tap-like event.
1. 1. First method A method of transmitting event information indicating that a tap has been detected on one terminal to the other terminal. Second method A method of transmitting sensor data from one terminal to the other terminal.

(About the first method)
An example in which it is determined whether or not the tap detection is a tap event is described based on the event information transmitted by the first method.

In the following, a case where information is transmitted from the left ear terminal 10L to the right ear terminal 10R will be described. On the contrary, it is possible to transmit information from the right ear terminal 10R to the left ear terminal 10L, and it is also possible to transmit information from both terminals to each other.

FIG. 5 is a diagram showing the flow of information in the first method.

As shown on the left side of FIG. 5, in the left ear terminal 10L, the sensor data of the IMU 121L is monitored by the one ear tap detection unit 151L. When the tap is detected by the one-ear tap detection unit 151L, the event information is transmitted to the tap determination unit 154R of the right ear terminal 10R.

As shown on the right side of FIG. 5, the sensor data of the IMU121R is also monitored by the one-ear tap detection unit 151R in the right-ear terminal 10R. When the tap is detected by the one-ear tap detection unit 151R, the event information is supplied from the one-ear tap detection unit 151R to the tap determination unit 154R.

The tap determination unit 154R has an event that the tap is detected in the one ear tap detection unit 151R based on the event information transmitted from the left ear terminal 10L and the event information supplied from the one ear tap detection unit 151R. Determine if it is a tap event.

In the right ear terminal 10R, the function execution sound and the cancellation sound are reproduced as described above based on the determination result by the tap determination unit 154R.

The flow of processing executed by the earphone 10 (information processing unit 141) will be described with reference to the flowchart of FIG.

In step S101, the one-ear tap detection unit 151R of the right ear terminal 10R detects the tap based on the sensor data of the IMU121R.

In step S102, the sound control unit 155R of the right ear terminal 10R reproduces the operation recognition sound and outputs it from the sound output unit 106R.

In step S103, the tap determination unit 154R determines whether or not event information indicating that a similar tap event has been detected has been received from the left ear terminal 10L within a certain period of time.

If it is determined in step S103 that the same event information has been received from the left ear terminal 10L within a certain period of time, the process proceeds to step S104. Here, since the tap is detected by the one-ear tap detection unit 151R, it is determined that the same event information has been received when the event information indicating that the tap has been detected is received from the left ear terminal 10L. .. In this case, the tap determination unit 154R determines that the event that the tap is detected in the one-ear tap detection unit 151R is a tap-like event. That is, the tap determination unit 154R determines that the event that the tap is detected by the one-ear tap detection unit 151R is invalid.

In step S104, the sound control unit 155R reproduces the canceling sound and outputs it from the sound output unit 106R. The cancellation sound may be output from the sound output units 106 of both terminals at the same time. After the cancellation sound is output, the process ends.

On the other hand, if it is determined in step S103 that the same event information has not been received from the left ear terminal 10L within a certain period, the process proceeds to step S105. In this case, the tap determination unit 154R determines that the event that the tap is detected by the one-ear tap detection unit 151R is a tap event. That is, the tap determination unit 154R determines that the event that the tap is detected by the one-ear tap detection unit 151R is valid.

In step S105, the sound control unit 155R reproduces the function execution sound and outputs it from the sound output unit 106R. The function execution sound may be output from the sound output units 106 of both terminals at the same time.

In step S106, the execution unit 156R executes a predetermined function according to the tap event. After the predetermined function is executed, the process ends.

(About the second method)
Next, an example in which it is determined whether or not the event that the tap is detected is a tap event will be described based on the sensor data transmitted by the second method.

FIG. 7 is a diagram showing the flow of information in the second method.

As shown on the left side of FIG. 7, in the left ear terminal 10L, the value of the sensor data of the IMU121L is transmitted to the tap determination unit 154R of the right ear terminal 10R. The transmission of the sensor data to the tap determination unit 154R is performed, for example, when a condition such that the value of the sensor data of the IMU 121L exceeds a predetermined threshold value is satisfied by applying a predetermined filter. The value of the sensor data of the IMU 121L may be constantly transmitted to the tap determination unit 154R.

As shown on the right side of FIG. 7, in the right ear terminal 10R, the sensor data of the IMU121R is monitored by the one ear tap detection unit 151R. When the tap is detected by the one-ear tap detection unit 151R, the sensor data of the IMU121R is supplied from the one-ear tap detection unit 151R to the tap determination unit 154R.

The tap determination unit 154R has an event that the tap is detected in the one ear tap detection unit 151R based on the sensor data transmitted from the left ear terminal 10L and the sensor data supplied from the one ear tap detection unit 151R. Determine if it is a tap event.

In step S151, the one-ear tap detection unit 151R of the right ear terminal 10R detects the tap based on the sensor data of the IMU121R.

In step S152, the sound control unit 155R of the right ear terminal 10R reproduces the operation recognition sound and outputs it from the sound output unit 106R.

In step S153, the reception control unit 153R receives the value of the sensor data transmitted from the left ear terminal 10L via the left / right communication unit 131R.

In step S154, the tap determination unit 154R performs a tap determination process. Specifically, the tap determination unit 154R calculates the similarity between the sensor data value of the IMU121R and the sensor data value of the IMU121L transmitted from the left ear terminal 10L, and is based on the calculated similarity. , It is determined whether the event that the tap is detected by the one-ear tap detection unit 151R is a tap event or a tap-like event. The determination based on the similarity of the values of the sensor data will be described later.

In step S155, the event that the tap is detected by the one-ear tap detection unit 151R using the sensor data of the left ear terminal 10L and the sensor data of the right ear terminal 10R in the tap determination process of step S154 is the tap event. If it is determined that the event did not exist, that is, it was a tap-like event, the process proceeds to step S156.

In step S156, the sound control unit 155R reproduces the canceling sound and outputs it from the sound output unit 106R. The cancellation sound may be output from the sound output units 106 of both terminals at the same time. After the cancellation sound is output, the process ends.

On the other hand, in step S155, if it is determined in the tap determination process of step S154 that the event that the tap was detected by the one-ear tap detection unit 151R is a tap event, the process proceeds to step S157.

In step S157, the sound control unit 155R reproduces the function execution sound and outputs it from the sound output unit 106R. The function execution sound may be output from the sound output units 106 of both terminals at the same time.

In step S158, the execution unit 156R executes a predetermined function according to the tap event. After the predetermined function is executed, the process ends.

(About judgment based on the similarity of sensor data)
Vibrations generated by operations other than the tap operation are often detected simultaneously by the sensors of both the left ear terminal 10L and the right ear terminal 10R. For example, when walking or chewing, the entire user's head vibrates. On the other hand, the vibration due to the tap is detected as a large vibration only in either the left ear terminal 10L or the right ear terminal 10R.

Therefore, when the similarity of the sensor data of the sensors mounted on both terminals is higher than a predetermined threshold value, the tap determination unit 154 determines that the tap detection by the one-ear tap detection unit 151 is a tap-like event. do.

FIG. 9 is a diagram showing an example of vibration due to an operation other than the tap operation.

FIG. 9 shows the waveform of the vibration generated by the chewing of the user during a meal. In FIG. 9, the vertical axis represents acceleration and the horizontal axis represents time. The same applies to the graph of FIG. 10 described later. A predetermined bandpass filter is applied to the waveform shown in FIG.

9A shows the waveform of the sensor data of the IMU121L mounted on the left ear terminal 10L, and FIG. 9B shows the waveform of the sensor data of the IMU121R mounted on the right ear terminal 10R. Has been done.

Comparing A and B in FIG. 9, it can be seen that when the user is eating, vibration is simultaneously detected by the sensors mounted on the left ear terminal 10L and the right ear terminal 10R. Therefore, the earphone 10 can detect that the user has chewed based on the sensor data of the sensors mounted on the left ear terminal 10L and the right ear terminal 10R, respectively.

FIG. 10 is a diagram showing another example of vibration due to an operation other than the tap operation.

FIG. 10 shows the waveform of the vibration generated by the walking of the user. A predetermined bandpass filter is applied to the waveform shown in FIG.

FIG. 10A shows the waveform of the sensor data of the IMU121L mounted on the left ear terminal 10L, and FIG. 10B shows the waveform of the sensor data of the IMU121R mounted on the right ear terminal 10R. It is shown.

Comparing A and B in FIG. 10, it can be seen that when the user is walking or running, vibration is simultaneously detected by the sensors mounted on the left ear terminal 10L and the right ear terminal 10R. Therefore, the earphone 10 can detect that the user is walking or running based on the sensor data of the sensors mounted on the left ear terminal 10L and the right ear terminal 10R, respectively.

As described above, since the earphone 10 can detect the user's mastication or walking based on the sensor data of both terminals, unlike the user's mastication or walking, one of the left ear side and the right ear side. It is possible to identify the tap action to be performed.

As described above, in the earphone 10, the tap operation by the user is detected based on the information of the sensors mounted on each of the left ear terminal 10L and the right ear terminal 10R. Further, in the earphone 10, the detection of the tap for an operation other than the tap operation is canceled after the operation recognition sound is reproduced.

Since the detection of the tap for the operation other than the tap operation is canceled based on the information of the sensor mounted on both terminals, the earphone 10 detects the tap operation based on the information of the sensor mounted on one terminal. It is possible to improve the accuracy of tap operation detection more than in the case.

Even if a tap is detected for an operation other than the tap operation, the operation recognition sound will be played, but the detection of such an erroneous tap is immediately canceled and the function is not executed. , It is not necessary to deteriorate the user's operability.

When the sensor information is constantly transmitted between both terminals, the band used for transmitting music data is narrowed, and sound interruption occurs when the radio wave condition is poor. In the earphone 10, the sensor information needs to be transmitted only when the tap is detected. By transmitting the sensor information only when a tap is detected and giving priority to the transmission of music data between both terminals, the sound interruption caused by transmitting the sensor information can be eliminated. It will be possible to reduce it.

When a tap is detected, the operation recognition sound is immediately reproduced, so that the earphone 10 can quickly provide feedback to the user's operation. By listening to the operation recognition sound, the user can immediately confirm that the operation has been detected. In addition, the user can confirm that the false detection of the tap due to the movement that the user does not intend to tap has been canceled by listening to the cancel sound.

-Modification example The cancel operation by the user may be accepted between the time when the operation recognition sound is played and the time when the function execution sound is played.

When the user mistakenly performs a tap operation or determines that an operation other than the tap operation is erroneously detected, the user taps with the earphone 10 by performing a cancel operation at the timing until the function execution sound is played. Detection can be canceled.

FIG. 11 is a diagram showing an example of how to use the earphone 10.

In the example of A in FIG. 11, the left ear terminal 10L is attached to the left ear of the user U, and the right ear terminal 10R is attached to the right ear of the user U.

In the example of B in FIG. 11, the right ear terminal 10R is used alone and is attached to the right ear of the user U.

As shown in B of FIG. 11, when only one of the right ear terminal 10R and the left ear terminal 10L is used, the behavior of the earphone 10 with respect to the tap detection as described above is partially changed. It becomes.

For example, when only the right ear terminal 10R is used, it is not possible to cancel the detection of the tap based on the information transmitted from the left ear terminal 10L. Therefore, an operation other than the tap operation is easily affected by noise, such as being detected as a tap on the right ear terminal 10R.

Therefore, the operation of the right ear terminal 10R is changed. As an example of the change, for example, in the right ear terminal 10R, the threshold value of the acceleration determined that the tap is detected is set to a larger value. This makes it possible to reduce erroneous detection such that an operation other than the tap operation is detected as a tap even when only one of the right ear terminals 10R is used. It should be noted that the threshold value may be determined for the evaluation value obtained by performing various arithmetic processing or machine learning processing on the value of the acceleration sensor.

When both terminals are used, the function assigned to the tap event in the right ear terminal 10R and the function assigned to the tap event in the left ear terminal 10L can be different functions. On the other hand, when only the right ear terminal 10R is used, a function different from that when both terminals are used may be assigned to the tap event in the right ear terminal 10R.

<2. Second Embodiment>
When detecting a tap using the sensor data of the IMU 121, it may be affected by noise such as detecting an operation other than the user's tap operation and detecting a vibration generated by playing music.

Vibration as noise is predicted based on the periodicity of noise detected in the past and the waveform of music data to be played, the acceleration value of noise is subtracted from the sensor data, and the timing of noise generation is determined from the tap detection period. Noise can be reduced by removing it. The earphone 10 can improve the accuracy of detecting the tap operation of the user by reducing the noise.

Two examples can be considered as examples for reducing noise.
1. 1. First example An example of recording the time change of noise caused by walking or chewing and canceling the noise when a tap is detected. Second example An example of predicting vibration based on the reproduced sound signal and canceling the predicted vibration.

First Example FIG. 12 is a block diagram showing a functional configuration example of the earphone 10 in the first example.

As shown in FIG. 12, in the earphone 10, the information processing unit 161 is realized by executing a predetermined program by the CPU 101 of FIG. The configuration shown in FIG. 12 can be provided in each of the left ear terminal 10L and the right ear terminal 10R, or either the left ear terminal 10L or the right ear terminal 10R. It is also possible to provide it in. The same applies to FIG. 18 described later.

The information processing unit 161 is composed of a tap detection unit 171 and an execution unit 172.

The tap detection unit 171 detects a tap based on the sensor data of the IMU 121 and the past sensor data. The tap detection unit 171 includes an acquisition unit 181, a calculation unit 182, and a determination unit 183.

The acquisition unit 181 acquires the sensor data from the IMU 121 and detects the peak of vibration based on the sensor data. Further, the acquisition unit 181 acquires the history of the peaks of the vibration detected in the past. For example, in the storage unit 109 of FIG. 2, information representing a peak detected in the past by the acquisition unit 181 is stored as a history, and is acquired by the acquisition unit 181.

The calculation unit 182 calculates the peak interval based on the history and sensor data acquired by the acquisition unit 181.

The determination unit 183 determines whether or not to recognize the peak detected by the acquisition unit 181 as a tap based on the calculation result by the calculation unit 182. That is, the determination unit 183 functions as a recognition unit that recognizes the tap based on the calculation result by the calculation unit 182.

The execution unit 172 executes processing according to the event that the tap is detected in the tap detection unit 171.

A tap recognition method based on peak intervals will be described with reference to FIGS. 13 to 15.

In general, the waveform of vibration generated by the tap operation of the user may be similar to the waveform of vibration generated by the landing of the walking user. The vibration waveform generated by the landing of the user depends on the shoes worn by the user and the way the user walks. For example, when a user wearing shoes with a hard sole such as high heels lands vigorously, vibration similar to the vibration generated by the tap operation is generated.

FIG. 13 is a diagram showing an example of a vibration waveform generated by landing during walking.

Since walking is a periodic movement, the vibration generated by landing during walking has periodicity. As shown in FIG. 13, the vibration waveform generated by landing during walking is the waveform W11 showing a peak at a constant period T.

Since the peak is detected periodically, the earphone 10 can predict the landing timing during walking. Although the walking habit differs depending on the person, the earphone 10 can predict the next vibration based on the vibration detected in the past.

The earphone 10 records a periodic vibration pattern such as walking, and predicts the timing of the next landing based on the vibration pattern. In the earphone 10, the vibration detected before and after the time predicted to be the landing timing is not recognized as a tap.

FIG. 14 is a diagram showing an example of the waveform of the sensor data of the IMU 121 when the vibration generated by the landing during walking is detected.

The solid line portion of the waveform W12 in FIG. 14 represents the vibration detected in the past by the IMU 121. The earphone 10 applies a low-pass filter to the acceleration value as sensor data of the IMU 121, and calculates the history of peaks within a certain period.

For example, the time when the norm of acceleration exceeds a predetermined threshold value and gives the maximum value within a certain period is calculated as the peak timing. In FIG. 14, three peaks are detected. The interval from the first peak to the second peak is T0, and the interval from the second peak to the third peak is T1.

The calculation unit 182 calculates the interval between the peak Pa and the nearest peak when a peak Pa that may be tapped is newly detected. The broken line portion of the waveform W12 in FIG. 14 represents the vibration including the newly detected peak Pa. In FIG. 14, it is calculated that the interval between the peak Pa and the nearest peak is Ta.

When the interval Ta is close to the interval T0 or interval T1 of the peaks detected in the past, the determination unit 183 does not recognize the peak Pa as a tap, but recognizes it as noise.

FIG. 15 is a diagram showing the waveform of the sensor data of the IMU 121 when the vibration generated by the tap operation performed after walking is detected.

The solid line portion of the waveform W13 in FIG. 15 represents the vibration detected in the past by the IMU 121. In the waveform W13 as well, the peak history is calculated in the same manner as in the waveform W12 of FIG.

The calculation unit 182 calculates the interval between the peak Pb and the nearest peak when a peak Pb that may be tapped is newly detected. The broken line portion of the waveform W13 in FIG. 15 represents the vibration including the newly detected peak Pb. In FIG. 15, the interval between the peak Pb and the nearest peak is calculated to be Tb.

When the interval Tb is shorter than the interval T0 or the interval T1, the determination unit 183 recognizes that the vibration including the peak Pb is likely to be tapped.

As described above, whether or not the tap detection unit 171 recognizes the vibration including the newly detected peak as a tap based on the interval between the peaks detected in the past and the interval between the newly detected peaks. Is determined.

For vibrations caused by movements other than walking, such as mastication, which have periodicity and individual differences, newly detected peaks are also based on the peak interval, as in the case of vibrations caused by walking described above. It is determined whether or not to recognize the vibration including the tap as a tap.

The flow of processing executed by the information processing unit 161 (FIG. 12) will be described with reference to the flowchart of FIG.

In step S201, the acquisition unit 181 acquires the history of the peaks of vibration detected in the past.

In step S202, the calculation unit 182 calculates the average T_ave of the intervals between the peaks of the vibrations detected in the past.

In step S203, the acquisition unit 181 detects a new peak Pa.

In step S204, the calculation unit 182 calculates the interval T_new between the peak Pa and the peak immediately before it.

In step S205, the determination unit 183 determines whether or not the difference between the average T_ave of the intervals and the interval T_new is equal to or less than the threshold value (THRES) (| T_new-T_ave | ≦ THRES).

If it is determined in step S205 that the difference between the average T_ave of the intervals and the interval T_new is equal to or less than the threshold value, the process proceeds to step S206.

In step S206, the determination unit 183 does not recognize the vibration including the peak Pa as a tap, but recognizes it as noise. After that, the process ends.

On the other hand, if it is determined in step S205 that the difference between the average T_ave of the intervals and the interval T_new exceeds the threshold value, the process proceeds to step S207.

In step S207, the determination unit 183 recognizes the vibration including the peak Pa as a vibration that may be tapped, and subsequently, the tap detection unit 171 executes the tap detection algorithm. By executing the tap detection algorithm, the tap detection unit 171 detects the tap based on the sensor data of the IMU 121.

When the tap is detected by the tap detection unit 171, the execution unit 172 executes a process corresponding to the event that the tap is detected by the one-ear tap detection unit 171.

As described above, in the earphone 10, the vibration including the newly detected peak is generated by the tap operation based on the interval between the peaks of the vibration detected in the past and the interval of the newly detected peak. Whether or not it is vibration is recognized. If the vibration containing the newly detected peak is recognized as noise, the noise will not be detected as a tap.

This makes it possible for the earphone 10 to not erroneously detect the vibration generated by walking as a tap, and to reliably detect the tap operation performed at a timing different from the landing during walking.

It is also possible to provide the tap detection unit 171 of FIG. 12 in the information processing unit 141 instead of the one-ear tap detection unit 151 of FIG. In this case, when the tap is detected by the tap detection unit 171, the determination of whether or not the event that the tap detection unit 171 has detected the tap is a tap event is based on the information transmitted from the other terminal. This is done by the tap determination unit 154.

In this case, the tap event is an event in which the tap is detected in the tap detection unit 171 by performing the process described with reference to FIGS. 6 and 8 after the process of step S207 of FIG. It is determined whether or not there is.

Further, in this case, after the processing of step S206 of FIG. 16 is performed, the event information or the sensor data may not be transmitted to the other terminal, and it is detected that the vibration generated by the operation which seems to be walking is detected. The information to be represented may be transmitted to the other terminal. As a result, the amount of data of communication performed between the left ear terminal 10L and the right ear terminal 10R can be reduced.

An example of recognizing whether or not a vibration including a peak is a tap based on the periodicity of a motion such as walking has been described, but the tap is recognized based on the result of learning the vibration pattern of walking. You may do it.

For example, vibration generated by one landing during walking is learned by a method such as machine learning, and whether or not the newly detected peak of vibration is the peak generated by landing during walking is based on the learning result. Is recognized.

When machine learning is performed, the sensor data as teacher data may be labeled with the user's behavior detected by a sensor other than IMU121.

For example, when it is detected that the user is walking based on the position and movement speed of the user detected by the GPS (Global Positioning System) sensor mounted on the earphone 10, the walking of the user is detected. The sensor data for the period is labeled as walking sensor data.

Based on the sensor data labeled as walking sensor data, learning data that learned the vibration generated by landing while walking is acquired.

Further, the noise generated by walking may be constantly observed by the earphone 10 using the sensor data of the IMU 121. For example, the earphone 10 can record the sensor data of the IMU 121 as it is, or can record the acceleration value exceeding the threshold value.

The learning result of learning noise based on the recorded sensor data and the acceleration value exceeding the threshold value is used for the gesture recognition process to detect the tap.

Note that the sensor data and the acceleration value exceeding the threshold value may be uploaded to a device (such as a smartphone) connected to the earphone 10 or a server provided as a cloud. In this case, the earphone 10 acquires the learning result of the noise performed externally, and performs the gesture recognition process using the learning result.

The flow of the noise learning process and the gesture recognition process executed by the information processing unit 161 will be described with reference to the sequence diagram of FIG.

For example, the noise learning process of FIG. 17 is performed before the gesture recognition process is performed.

In step S301 of the noise learning process, the acquisition unit 181 (FIG. 12) acquires the noise generated by walking. The history of noise detected and recorded in the past is acquired by the acquisition unit 181.

In step S302, the calculation unit 182 calculates a noise removal filter that removes noise caused by walking based on the noise history acquired in step S301.

The noise reduction filter calculated in step S302 is used in step S352 of the gesture recognition process.

In step S351 of the gesture recognition process, the acquisition unit 181 acquires the sensor data of the IMU 121.

In step S352, the calculation unit 182 applies a noise reduction filter to the sensor data of the IMU 121 acquired in step S351, and corrects the sensor data of the IMU 121.

In step S353, the tap detection unit 171 executes the tap detection algorithm using the corrected sensor data.

Second Example FIG. 18 is a block diagram showing a functional configuration example of the earphone 10 in the second example. In FIG. 18, the same configuration as that of the earphone 10 in FIG. 12 is designated by the same reference numeral. Duplicate explanations will be omitted as appropriate.

The configuration of the information processing unit 161 shown in FIG. 18 is different from the configuration described with reference to FIG. 12 in that the tap detection unit 171 has a correction unit 185 instead of the calculation unit 182.

The acquisition unit 181 further acquires a music reproduction signal as music data to be reproduced.

The determination unit 183 determines whether or not the power of the music reproduction signal acquired by the acquisition unit 181 is equal to or greater than the threshold value.

The correction unit 185 corrects the sensor data acquired by the acquisition unit 181 based on the music reproduction signal acquired by the acquisition unit 181. Specifically, the correction unit 185 predicts the vibration generated by the reproduction of the music reproduction signal, and corrects the sensor data by subtracting the acceleration value of the predicted vibration from the sensor data.

The tap detection unit 171 detects a tap based on the corrected sensor data. That is, the tap detection unit 171 functions as a recognition unit that recognizes the tap based on the sensor data corrected by the correction unit 185.

FIG. 19 is a diagram schematically showing an example of the structure of the earphone 10.

Although only one terminal of the earphone 10 is shown in FIG. 19, each of the left ear terminal 10L and the right ear terminal 10R has the same structure.

When music is played on the earphone 10, the vibration plate 191 provided on the earphone 10 vibrates to generate sound vibration. Since the IMU 121 is provided in the vicinity of the diaphragm 191, the vibration of the diaphragm 191 may be detected by the IMU 121.

Since the music reproduction signal contains various frequency components, when music is reproduced by the earphone 10, the vibration plate 191 may generate vibration of sound similar to the vibration generated by the tap operation.

FIG. 20 is a diagram showing an example of a waveform of a music reproduction signal.

In the graph of FIG. 20, the vertical axis represents the power of the music reproduction signal, and the horizontal axis represents time.

The determination unit 183 predicts the peak of vibration generated by the reproduction of the music reproduction signal based on the music reproduction signal as shown in the waveform W21 of FIG. During the period before and after the peak timing, the tap detection unit 171 prevents the tap from being detected. Alternatively, the correction unit 185 corrects the sensor data based on the predicted vibration.

In the following, the first method of preventing the tap from being detected in the period before and after the timing at which the peak music reproduction signal is reproduced, and the second method of correcting the sensor data based on the music reproduction signal are performed. The flow of each process will be explained.

(About the first method)
The flow of processing executed by the information processing unit 161 (FIG. 18) will be described with reference to the flowchart of FIG. 21.

In step S401, the acquisition unit 181 acquires the acceleration value as the sensor data of the IMU 121.

In step S402, the acquisition unit 181 acquires the music reproduction signal to be reproduced by the earphone 10.

In step S403, the determination unit 183 determines whether or not the power of the music reproduction signal for a certain period in the past is equal to or greater than the threshold value. For example, the determination unit 183 makes a determination using the power of the music reproduction signal reproduced in a certain period until the timing when the sensor data of the IMU 121 is acquired.

If it is determined in step S403 that the power of the music reproduction signal for a certain period in the past is equal to or higher than the threshold value, the process ends. That is, the tap detection by the tap detection unit 171 is suppressed (not performed).

On the other hand, if it is determined in step S403 that the power of the music reproduction signal for a certain period in the past is less than the threshold value, the process proceeds to step S404.

In step S404, the information processing unit 161 executes the tap detection algorithm described above.

(About the second method)
The flow of processing executed by the information processing unit 161 (FIG. 18) will be described with reference to the flowchart of FIG. 22.

In step S451, the acquisition unit 181 acquires the acceleration value as the sensor data of the IMU 121.

In step S452, the acquisition unit 181 acquires the music reproduction signal to be reproduced by the earphone 10.

In step S453, the determination unit 183 determines whether or not the power of the music reproduction signal is equal to or greater than the threshold value. For example, the determination unit 183 makes a determination using the power of the music reproduction signal reproduced at the timing when the sensor data of the IMU 121 is acquired.

If it is determined in step S453 that the power of the music reproduction signal is equal to or greater than the threshold value, the process proceeds to step S454.

In step S454, the correction unit 185 performs a process of subtracting the acceleration value of the vibration predicted to be generated by reproducing the music reproduction signal from the acceleration value of the sensor data, and corrects the sensor data.

On the other hand, if it is determined in step S453 that the power of the music reproduction signal is less than the threshold value, the process of step S454 is skipped.

In step S455, the tap detection unit 171 executes the above-mentioned tap detection algorithm.

As described above, in the earphone 10, the vibration generated by reproducing the music reproduction signal is predicted. In the earphone 10, tap detection is suppressed and the sensor data of the IMU 121 is corrected based on the predicted vibration.

As a result, the earphone 10 does not erroneously detect the vibration generated by playing music as a tap, and reliably detects the vibration of the tap operation such that the peak is different from the vibration of the output sound as a tap. It becomes possible to do.

Similar to the first example of FIG. 12, it is also possible to provide the tap detection unit 171 of FIG. 18 in the information processing unit 141 instead of the one-ear tap detection unit 151 of FIG.

<3. Third Embodiment>
FIG. 23 is a block diagram showing another hardware configuration example of the earphone 10.

In FIG. 23, the same components as those of the earphone 10 in FIG. 2 are designated by the same reference numerals. Duplicate explanations will be omitted as appropriate.

The configuration of the earphone 10 shown in FIG. 23 is such that the sensor unit 107L of the left ear terminal 10L is provided with the electrostatic sensor 201L, and the sensor unit 107R of the right ear terminal 10R is provided with the electrostatic sensor 201R. It is different from the configuration described with reference to.

The

electrostatic sensors

201L and 201R are composed of, for example, sensors of the XY coordinate detection method and the electrostatic button detection method. The

electrostatic sensors

201L and 201R output signals corresponding to the user's contact with the

electrostatic sensors

201L and 201R as sensor data.

FIG. 24 is a block diagram showing a functional configuration example of the earphone 10.

As shown in FIG. 24, in the earphone 10, the information processing unit 211 is realized by executing a predetermined program by the CPU 101 of FIG. 24. The configuration shown in FIG. 24 can be provided in each of the left ear terminal 10L and the right ear terminal 10R, or either the left ear terminal 10L or the right ear terminal 10R. It is also possible to provide it in.

The information processing unit 211 is composed of a tap detection unit 221, a tap determination unit 222, and an execution unit 223.

The tap detection unit 221 acquires sensor data from the sensor unit 107 and detects the main body tap and the face tap based on the sensor data of the IMU 121. Further, the tap detection unit 221 detects the main body tap based on the sensor data of the electrostatic sensor 201. The main body tap and face tap will be described later.

The tap detection result based on the sensor data of the IMU 121 and the tap detection result based on the sensor data of the electrostatic sensor 201 are supplied to the tap determination unit 222.

The tap determination unit 222 distinguishes between the main body tap and the face tap based on the detection result supplied from the tap detection unit 221. The tap determination unit 222 controls the execution unit 223 based on the identification result to execute the function assigned to the main body tap or the function assigned to the face tap.

The execution unit 223 executes the function assigned to the main body tap or the function assigned to the face tap according to the control by the tap determination unit 222.

FIG. 25 is a diagram showing an example of a main body tap and a face tap.

The main body tap indicates that the user taps the housing of the earphone 10. FIG. 25A shows a situation in which the user U is tapping the housing of the left ear terminal 10L attached to the left ear.

The main body tap is detected based on the sensor data of the electrostatic sensor 201 installed in the area A11 of a part of the housing of the earphone 10. Since the electrostatic sensor 201 is installed only in a part of the housing of the earphone 10, the user U may tap a part other than the area A11 in an attempt to tap the main body.

Therefore, in the earphone 10, the tap of the main body is detected by using the sensor data of the IMU 121 together with the sensor data of the electrostatic sensor 201.

On the other hand, the face tap means that the user taps around the ear where the earphone 10 is attached. FIG. 25B shows a situation in which the user U is tapping around the left ear to which the left ear terminal 10L is attached. For example, the region A12 around the left ear is defined as a region where a face tap can be detected.

FIG. 26 is a diagram showing the distribution of tap intensities of the main body tap and the face tap.

In FIG. 26, the vertical axis represents the frequency and the horizontal axis represents the tap strength. The waveform W31 represents the distribution of the tap strength of the main body tap, and the waveform W32 represents the distribution of the tap strength of the face tap. Here, the tap intensity represents the intensity of vibration detected by the IMU 121 in response to tapping.

Comparing the waveform W31 and the waveform W32, the tap strength of the main body tap is high and the tap strength of the face tap is low.

Since the tap strength of the main body tap is high, when strong vibration is detected by the IMU 121, the earphone 10 detects that the main body tap has been performed. On the other hand, since the tap strength of the face tap is low, when the IMU 121 detects a weak vibration, the earphone 10 detects that the face tap has been performed.

When the user taps the housing of the earphone 10 with a weak force, it is difficult to distinguish between the main body tap and the face tap based on the sensor data of the IMU 121. Even in this case, if the user is in contact with the electrostatic sensor 201, the earphone 10 can identify the main body tap based on the sensor data of the electrostatic sensor 201.

Next, the flow of processing executed by the earphone 10 (information processing unit 211) will be described with reference to the flowchart of FIG. 27.

The process of FIG. 27 is started, for example, when a tap-like vibration is detected by the IMU 121.

In step S501, the tap detection unit 221 acquires the acceleration value as the sensor data from the IMU 121, and detects the main body tap or the face tap based on the sensor data of the IMU 121.

In step S502, the tap detection unit 221 acquires the sensor data from the electrostatic sensor 201 and detects the main body tap based on the sensor data of the electrostatic sensor 201.

In step S503, the tap determination unit 222 determines whether or not the tap is detected by the electrostatic sensor 201.

If it is determined in step S503 that the tap is detected by the electrostatic sensor 201, the process proceeds to step S504. For example, when the user's contact is detected by the electrostatic sensor 201, it is determined that the tap has been detected.

In step S504, the tap determination unit 222 determines that the main body has been tapped by the user. After that, the execution unit 223 executes the function assigned to the main body tap.

After it is determined in step S503 that the tap is detected by the electrostatic sensor 201, it may be determined whether or not the main body tap is detected based on the sensor data of the IMU 121.

When it is determined that the main body tap is detected based on the sensor data of the IMU 121, the process of step S504 is performed. If it is determined that the main body tap is not detected based on the sensor data of the IMU 121, it is determined that the electrostatic sensor 201 has detected noise, and the process ends.

Thereby, for example, when the electrostatic sensor 201 detects the contact of the user's hair, it is possible to avoid determining the contact as the main body tap.

On the other hand, if it is determined in step S503 that the tap is not detected by the electrostatic sensor 201, the process proceeds to step S505.

In step S505, the tap determination unit 222 determines whether or not the main body tap is detected by the IMU 121. For example, when the tap strength measured based on the sensor data of the IMU 121 is higher than a predetermined threshold value, it is determined that the main body tap is detected by the IMU 121.

If it is determined in step S505 that the main body tap has been detected by the IMU 121, the process proceeds to step S504, it is determined that the main body tap has been performed by the user as described above, and the function assigned to the main body tap is executed.

On the other hand, if it is determined in step S505 that the main body tap is not detected by the IMU 121, the process proceeds to step S506.

In step S506, the tap determination unit 222 determines whether or not the face tap is detected by the IMU 121. For example, when the tap strength measured based on the sensor data of the IMU 121 is lower than a predetermined threshold value, it is determined that the face tap is detected by the IMU 121.

If it is determined in step S506 that the face tap was not detected by the IMU 121, the process ends. For example, if the vibration is not detected by the IMU 121, it is determined that the face tap is not detected by the IMU 121.

On the other hand, if it is determined in step S506 that the face tap is detected by the IMU 121, the process proceeds to step S507.

In step S507, the tap determination unit 222 determines that the face tap has been performed by the user. After that, the execution unit 223 executes the function assigned to the face tap.

As described above, in the earphone 10, the main body tap and the face tap are identified based on the sensor data of the electrostatic sensor 201 and the sensor data of the IMU 121.

As a result, the earphone 10 can identify the main body tap based on the sensor data of the IMU 121 even when the user touches an area other than the area where the electrostatic sensor 201 is installed. Further, the earphone 10 can identify the main body tap based on the sensor data of the electrostatic sensor 201 even when the user taps the area where the electrostatic sensor 201 is installed with a weak force.

<4. Modification example>
In the first embodiment, it has been described that sound effects such as an operation recognition sound, a function execution sound, and a cancellation sound are reproduced, but information indicating that a tap has been detected and information indicating that the function is executed have been described. , And other means of feeding back to the user information that the tap detection has been canceled may be used.

For example, an image representing the above-mentioned information may be displayed on the screen of an external device such as a smartphone. Further, the vibration or sound representing the above-mentioned information may be output from an external device. Further, at least one of the sound, vibration, and image representing the above-mentioned information may be combined and output.

Although the information processing unit 141 in the first embodiment, the information processing unit 161 in the second embodiment, and the information processing unit 211 in the third embodiment have been described as being provided on the earphone 10. A part or all of these information processing units may be provided in an external device of the earphone 10.

For example, the tap determination unit 154 of the information processing unit 161 may be provided on a smartphone connected to the earphone 10 by wireless communication or wired communication. In this case, the smartphone determines whether or not the event that the tap is detected by the one-ear tap detection unit 151 is a tap event, and the determination result is transmitted to the earphone 10.

The above-mentioned series of processes can be executed by hardware or software. When a series of processes are executed by software, the programs constituting the software are installed on a computer embedded in dedicated hardware, a general-purpose personal computer, or the like.

The installed program is provided by recording it on a removable medium consisting of an optical disk (CD-ROM (Compact Disc-Read Only Memory), DVD (Digital Versatile Disc), etc.) or a semiconductor memory. It may also be provided via a wired or wireless transmission medium such as a local area network, the Internet, or digital broadcasting. The program can be installed in advance in the ROM 102 or the storage unit 109 shown in FIGS. 3 and 24.

The program executed by the computer may be a program in which processing is performed in chronological order according to the order described in the present specification, in parallel, or at a necessary timing such as when a call is made. It may be a program in which processing is performed.

The effects described in the present specification are merely examples and are not limited, and other effects may be obtained.

The embodiment of the present technology is not limited to the above-described embodiment, and various changes can be made without departing from the gist of the present technology.

For example, this technology can take a cloud computing configuration in which one function is shared by multiple devices via a network and processed jointly.

In addition, each step described in the above flowchart can be executed by one device or shared by a plurality of devices.

Further, when a plurality of processes are included in one step, the plurality of processes included in the one step can be executed by one device or shared by a plurality of devices.

<Example of configuration combination>
The present technology can also have the following configurations.

(1)
Equipped with two terminals equipped with sensors that detect vibrations including vibrations and noises that represent user operations.
On the other hand, the terminal
A receiving unit that receives the detection result of the first sensor mounted on the other terminal, and a receiving unit.
An information processing device having a determination unit for determining whether or not the detection result by the second sensor mounted on one of the terminals is noise based on the detection result received by the reception unit.
(2)
On the other hand, the terminal further includes an execution unit that executes processing according to the detection result by the second sensor when the determination unit determines that the detection result by the second sensor is not noise. The information processing device according to 1).
(3)
On the other hand, when the determination unit determines that the detection result by the second sensor is noise, the terminal controls to notify that the detection result by the second sensor is noise. The information processing apparatus according to (2) above, further comprising a unit.
(4)
The information processing device according to (3) above, wherein the control unit controls to output at least one notification by sound, vibration, and an image notifying that the detection result by the second sensor is noise. ..
(5)
The control unit has a timing between the timing when the vibration is detected by the second sensor and the timing when the processing is executed by the execution unit, and the detection result by the second sensor is noise. The information processing apparatus according to (3) or (4) above, which controls to notify the fact.
(6)
The information processing device according to any one of (3) to (5) above, wherein the control unit further controls to output a sound at the timing when the vibration is detected by the second sensor.
(7)
The information processing device according to any one of (3) to (6) above, wherein the control unit further controls to output a sound at a timing when the processing is executed by the execution unit.
(8)
The determination unit
The degree of similarity between the detection result by the first sensor and the detection result by the second sensor is calculated.
The information processing apparatus according to any one of (1) to (7) above, which determines whether or not the detection result by the second sensor is noise based on the calculated similarity.
(9)
The information processing device according to (8) above, wherein the determination unit determines that the detection result by the second sensor is noise when the similarity is higher than a predetermined threshold value.
(10)
As the detection result of the first sensor, the receiving unit receives event information indicating that the user's operation is detected according to the sensor data of the first sensor.
The determination unit determines whether or not it is effective to detect the user's operation according to the sensor data of the second sensor based on the event information received by the reception unit (1). ) To (9).
(11)
The receiving unit receives the sensor data of the first sensor as the detection result of the first sensor, and receives the sensor data of the first sensor.
The determination unit is described in any one of (1) to (9) above, which determines whether or not the sensor data of the second sensor is noise based on the sensor data received by the reception unit. Information processing device.
(12)
The information processing device according to any one of (1) to (11), further comprising a sound output unit that outputs a sound corresponding to a sound signal according to the operation of the user.
(13)
An information processing device equipped with two terminals equipped with sensors for detecting vibration.
One of the terminals
Upon receiving the detection result by the first sensor mounted on the other terminal,
An information processing method for determining whether or not the detection result by the second sensor mounted on one of the terminals is noise based on the received detection result.
(14)
A sensor unit that detects vibrations including vibrations and noises that represent user operations, and
An information processing device including a recognition unit that recognizes a user's operation based on a detection result by the sensor unit and a noise prediction result detected by the sensor unit.
(15)
The recognition unit recognizes the user's operation based on the comparison result between the peak interval of the vibration detected by the sensor unit and the noise period predicted based on the periodic vibration detected in the past. The information processing apparatus according to (14) above.
(16)
The recognition unit corrects the detection result by the sensor unit based on the learning result of the noise detected by the sensor unit, and recognizes the user's operation based on the corrected detection result according to the above (14). Information processing equipment.
(17)
It also has a sound output unit that outputs sound based on the sound signal.
The recognition unit recognizes a user's operation based on a detection result by the sensor unit and a prediction result of noise generated by the reproduction of the sound signal by the sound output unit. Information processing device.
(18)
The recognition unit determines whether or not the detection result by the sensor unit is noise based on the vibration generated by the sound output by the sound output unit, and is based on the detection result determined to be not noise. The information processing apparatus according to (17) above, which recognizes the user's operation.
(19)
The recognition unit corrects the detection result by the sensor unit based on the vibration generated by the sound output by the sound output unit, and recognizes the user's operation based on the corrected detection result (17). ). The information processing device.
(20)
It is provided with two terminals on which the sensor unit is mounted, or one terminal on which the sensor unit is mounted.
The information processing device according to any one of (14) to (19), wherein the terminal has the recognition unit.
(21)
A contact sensor that detects contact by the user, and
A vibration sensor that detects vibration and
An information processing device including a determination unit for determining whether or not the user has touched the housing based on the detection result of the contact sensor and the detection result of the vibration sensor.
(22)
The information processing device according to (21), wherein the contact sensor is installed in a partial area of the housing.
(23)
The information according to (21) or (22), wherein the determination unit determines that the user has touched the housing when the intensity of the vibration detected by the vibration sensor is higher than a predetermined threshold value. Processing device.
(24)
Based on the detection result of the contact sensor and the detection result of the vibration sensor, the determination unit further determines whether or not the user has touched a part of his / her own portion where the housing is attached. The information processing apparatus according to any one of (21) to (23).
(25)
Depending on the function according to the determination by the determination unit that the user has touched the housing, or according to the determination that the user has touched a part of the self-partition to which the housing is attached. The information processing apparatus according to (24) above, further comprising an execution unit that executes the above-mentioned function.
(26)
It comprises two terminals equipped with the contact sensor and the vibration sensor, or one terminal equipped with the contact sensor and the vibration sensor.
The terminal has the determination unit, and the terminal has the determination unit.
The information processing device according to any one of (21) to (25), wherein the housing is a housing of the terminal.

10 earphones, 10L left ear terminal, 10R right ear terminal, 106 sound output unit, 107 sensor unit, 121 IMU, 141 information processing unit, 151 one-ear tap detection unit, 152 transmission control unit, 153 reception control unit, 154 taps. Judgment unit, 155 sound control unit, 156 execution unit, 161 information processing unit, 171 tap detection unit, 172 execution unit, 181 acquisition unit, 182 calculation unit, 183 judgment unit, 175 correction unit, 191 diaphragm, 201 electrostatic sensor , 211 Information processing unit, 221 tap detection unit, 222 tap judgment unit, 223 execution unit

Claims

Equipped with two terminals equipped with sensors that detect vibrations including vibrations and noises that represent user operations.
On the other hand, the terminal
A receiving unit that receives the detection result of the first sensor mounted on the other terminal, and a receiving unit.
An information processing device having a determination unit for determining whether or not the detection result by the second sensor mounted on one of the terminals is noise based on the detection result received by the reception unit.
One claim further comprises an execution unit that executes processing according to the detection result by the second sensor when the determination unit determines that the detection result by the second sensor is not noise. The information processing apparatus according to 1.
On the other hand, when the determination unit determines that the detection result by the second sensor is noise, the terminal controls to notify that the detection result by the second sensor is noise. The information processing apparatus according to claim 2, further comprising a unit.
The information processing apparatus according to claim 3, wherein the control unit controls to output at least one notification by sound, vibration, and an image notifying that the detection result by the second sensor is noise.
The control unit has a timing between the timing when the vibration is detected by the second sensor and the timing when the processing is executed by the execution unit, and the detection result by the second sensor is noise. The information processing apparatus according to claim 3, which controls to notify the fact.
The information processing device according to claim 3, wherein the control unit further controls to output a sound at the timing when the vibration is detected by the second sensor.
The information processing apparatus according to claim 3, wherein the control unit further controls to output a sound at a timing when the processing is executed by the execution unit.
The determination unit
The degree of similarity between the detection result by the first sensor and the detection result by the second sensor is calculated.
The information processing apparatus according to claim 1, wherein it is determined whether or not the detection result by the second sensor is noise based on the calculated similarity.
The information processing device according to claim 8, wherein the determination unit determines that the detection result by the second sensor is noise when the similarity is higher than a predetermined threshold value.
As a detection result of the first sensor, the receiving unit receives event information indicating that the user's operation is detected according to the sensor data of the first sensor.
A claim that the determination unit determines whether or not it is effective to detect the user's operation according to the sensor data of the second sensor based on the event information received by the reception unit. The information processing apparatus according to 1.
The receiving unit receives the sensor data of the first sensor as the detection result of the first sensor, and receives the sensor data of the first sensor.
The information processing device according to claim 1, wherein the determination unit determines whether or not the sensor data of the second sensor is noise based on the sensor data received by the reception unit.
The information processing apparatus according to claim 1, further comprising a sound output unit that outputs a sound corresponding to a sound signal according to the operation of the user.
An information processing device equipped with two terminals equipped with sensors for detecting vibration.
One of the terminals
Upon receiving the detection result by the first sensor mounted on the other terminal,
An information processing method for determining whether or not the detection result by the second sensor mounted on one of the terminals is noise based on the received detection result.
A sensor unit that detects vibrations including vibrations and noises that represent user operations, and
An information processing device including a recognition unit that recognizes a user's operation based on a detection result by the sensor unit and a noise prediction result detected by the sensor unit.
The recognition unit recognizes the user's operation based on the comparison result between the peak interval of the vibration detected by the sensor unit and the noise period predicted based on the periodic vibration detected in the past. The information processing apparatus according to claim 14.
The 14th aspect of the present invention, wherein the recognition unit corrects the detection result by the sensor unit based on the learning result of the noise detected by the sensor unit, and recognizes the operation of the user based on the corrected detection result. Information processing equipment.
It also has a sound output unit that outputs sound based on the sound signal.
The 14th aspect of claim 14, wherein the recognition unit recognizes the user's operation based on the detection result by the sensor unit and the prediction result of noise generated by the reproduction of the sound signal by the sound output unit. Information processing device.
The recognition unit determines whether or not the detection result by the sensor unit is noise based on the vibration generated by the sound output by the sound output unit, and is based on the detection result determined to be not noise. The information processing apparatus according to claim 17, which recognizes the user's operation.
The recognition unit corrects the detection result by the sensor unit based on the vibration generated by the sound output by the sound output unit, and recognizes the user's operation based on the corrected detection result. The information processing apparatus according to 17.
It is provided with two terminals on which the sensor unit is mounted, or one terminal on which the sensor unit is mounted.
The information processing device according to claim 14, wherein the terminal has the recognition unit.