WO2024058147A1

WO2024058147A1 - Processing device, output device, and processing system

Info

Publication number: WO2024058147A1
Application number: PCT/JP2023/033103
Authority: WO
Inventors: 利知金岡
Original assignee: 京セラ株式会社
Priority date: 2022-09-15
Filing date: 2023-09-11
Publication date: 2024-03-21

Abstract

This processing device comprises a control unit. When data of external sound, in which information of audible-range sound is contained in the components of inaudible-range sound, is acquired, sound in the audible range is detected according to the components of the inaudible-range sound.

Description

Processing equipment, output equipment and processing systems

Cross-reference to related applications

This application claims priority to Japanese Patent Application No. 2022-147250 filed in Japan on September 15, 2022, and the entire disclosure of this earlier application is incorporated herein by reference.

The present disclosure relates to a processing device, an output device, and a processing system.

Conventionally, techniques are known that allow a user to listen to surrounding sounds while wearing a processing device such as headphones or earphones. In such technology, a portable music playback device is known that includes a notification unit that notifies you of the match from headphones when an external sound matches a predetermined phrase (Patent Document 1).

Japanese Patent Application Publication No. 2001-256771

A processing device according to an embodiment of the present disclosure includes:
A control unit is provided that detects a sound in the audible range based on the sound component in the inaudible range when external sound data including information on a sound in the audible range is included in a sound component in the inaudible range.

An output device according to an embodiment of the present disclosure includes:
A speaker and
a control unit that, when receiving data of an audible sound, generates an inaudible sound containing information of the audible sound as a component, and outputs a sound in which the audible sound and the inaudible sound are superimposed on each other through the speaker;
Equipped with.

A processing system according to an embodiment of the present disclosure includes:
When receiving sound data in the audible range, generates a sound in the inaudible range that includes information on the sound in the audible range as a component, and outputs a sound in which the sound in the audible range and the sound in the inaudible range are superimposed. an output device for
a processing device that detects a sound in the audible range based on a component of the sound in the inaudible range when external sound data is acquired;
including.

1 is a diagram showing a schematic configuration of a processing system according to an embodiment of the present disclosure. FIG. 2 is a block diagram of the output device shown in FIG. 1. FIG. FIG. 3 is a diagram for explaining frame division. FIG. 2 is a block diagram of the sound collector and processing device shown in FIG. 1. FIG. 1 is a graph of a frequency spectrum signal. 3 is a flowchart showing the flow of sound output processing executed by the output device shown in FIG. 2. FIG. 5 is a flowchart showing the flow of sound collection processing executed by the sound collector shown in FIG. 4. FIG. 5 is a flowchart showing the flow of sound acquisition processing executed by the processing device shown in FIG. 4. FIG. 5 is a flowchart showing the flow of sound acquisition processing executed by the processing device shown in FIG. 4. FIG. FIG. 3 is a block diagram of a processing device according to another embodiment. FIG. 3 is a diagram for explaining processing for emphasizing overtones. It is a flowchart which shows the flow of sound acquisition processing performed by the processing device concerning other embodiments. 5 is a block diagram of a modification of the sound collector (processing device) shown in FIG. 4. FIG.

There is room for improvement in the conventional technology. For example, a user may use a processing device, such as headphones or earphones, in noisy conditions. In this case, it may become difficult for the processing device to distinguish between the noise and the voice to be detected. According to an embodiment of the present disclosure, a technique for detecting sound with high accuracy can be provided.

Hereinafter, embodiments according to the present disclosure will be described with reference to the drawings.

(Processing system configuration)
As shown in FIG. 1, the processing system 1 includes an output device 2, a sound collector 3, and a processing device 4. In this embodiment, the sound collector 3 and the processing device 4 are configured as separate devices. However, the sound collector 3 and the processing device 4 may be configured as an integrated device as shown in FIG. 13, which will be described later.

The output device 2 is used, for example, in an airport waiting room or a station premises. The output device 2 may be part of a public address system installed in a building.

The output device 2 outputs audible sound. For example, the output device 2 outputs an audible announcement sound for notifying the estimated arrival time of an airplane or train. Audible sounds are sounds that can be heard by the average human ear. The frequency band of the audible sound, that is, the audible range is, for example, a band of 20 [Hz] to 18 [kHz].

The output device 2 outputs inaudible sounds along with audible sounds. Inaudible sounds are sounds that cannot be heard by the average human ear. The frequency band of the inaudible sound, that is, the inaudible range, is, for example, a band of 20 [Hz] or less, a band of 18 [kHz] to 22 [kHz], or a band of 22 [kHz] or more. In this embodiment, the output device 2 outputs inaudible sound in a band of 18 [kHz] to 22 [kHz].

The inaudible sound component output by the output device 2 includes information about the audible sound output by the output device 2. Since the inaudible sound component includes audible sound information, the processing device 4 can accurately detect the audible sound output by the output device 2 even under noisy conditions, as will be described later.

The sound collector 3 is, for example, an earphone. However, the sound collector 3 is not limited to earphones. The sound collector 3 may be headphones or the like. The sound collector 3 is worn by the user. The sound collector 3 can output music and the like to the user. The sound collector 3 may include an earphone section that is attached to the user's left ear, and an earphone section that is attached to the user's right ear.

The sound collector 3 collects external sounds around the sound collector 3. External sound is sound emitted outside the sound collector 3. The sound collector 3 collects external sounds around the user by being worn by the user. That is, external sounds include sounds emitted around the user. External sounds may include sounds made by the user himself. The sound collector 3 outputs the collected external sounds around the user to the user under the control of the processing device 4 . With this configuration, the user can hear external sounds around him while wearing the sound collector 3.

The processing device 4 is, for example, a smartphone, a mobile phone, a tablet, or a personal computer (PC). However, the processing device 4 is not limited to this.

The processing device 4 is operated by a user. The user can operate the processing device 4 to make settings for the sound collector 3 and the like.

The processing device 4 acquires data on external sounds collected by the sound collector 3. The external sound collected by the sound collector 3 may include noise in addition to the audible sound and inaudible sound output by the output device 2. Here, inaudible sounds are rare in nature compared to audible sounds. Therefore, most of the noise is audible. Therefore, the inaudible sound output by the output device 2 is not easily affected by noise. Further, as described above, the inaudible sound component outputted by the output device 2 includes information about the audible sound outputted by the output device 2. Therefore, by analyzing the inaudible sounds of the external sounds collected by the sound collector 3, the processing device 4 can accurately convert the audible sounds output by the output device 2 even if the external sounds include noise. It can be detected well.

(Configuration of output device)
As shown in FIG. 2, the output device 2 includes an input section 10, an input section 11, a conversion section 12, a switch 13, a delay buffer 14, a superimposition section 15, a speaker 16, a storage section 17, A control unit 18 is provided.

In the following, digital sound data refers to data obtained by sampling analog sound data at a preset sampling rate. Sound analog data refers to sound data collected by a microphone or the like.

The input unit 10 can receive text data input from the user. The input unit 10 includes at least one input interface that can accept input of text data. The input interface includes, for example, a keyboard. The input unit 10 may receive text data input from another device. In this case, the input unit 10 may be configured to include at least one connection interface connectable to other devices. The connection interface is an interface compatible with standards such as USB (Universal Serial Bus), HDMI (registered trademark) (High-Definition Multimedia Interface), or Bluetooth (registered trademark).

The input unit 11 can receive input of audible sound data from other devices. It is assumed that the audible sound data inputted from the input unit 11 is audible sound digital data. The input unit 11 includes at least one connection interface connectable to other devices. The connection interface may be the same as or similar to the input section 10.

The conversion unit 12 acquires text data from the input unit 10 under the control of the control unit 18. The converter 12 converts text data into audible sound data under the control of the controller 18 . For example, the conversion unit 12 converts text data into audible sound data by text-to-speech synthesis. Speech data used for text-to-speech synthesis may be stored in the storage unit 17. It is assumed that the audible sound data after conversion is digital sound data.

The switch 13 is connected between the conversion section 12, the input section 11, the delay buffer 14, and the control section 18. The switch 13 switches the electrical connection relationship between the input section 11 , the conversion section 12 , the delay buffer 14 , and the control section 18 based on the control of the control section 18 . The switch 13 includes, for example, an arbitrary switching element such as a transistor.

The delay buffer 14 is a temporary storage memory. The delay buffer 14 acquires audible sound data from the switch 13 under the control of the control unit 18 . The delay buffer 14 holds the acquired audible sound data for a predetermined period of time. The predetermined time is, for example, the time from when the extraction unit 19 (described later) acquires audible sound data from the switch 13 until when the generation unit 20 (described below) outputs inaudible sound data to the superimposition unit 15. After holding the audible sound data for a predetermined time, the delay buffer 14 outputs the audible sound data to the superimposing section 15 . The delay buffer 14 is configured to include the same or similar components as the storage unit 17, which will be described later. The delay buffer 14 may be part of the storage unit 17.

The superimposition unit 15 acquires audible sound data from the delay buffer 14 under the control of the control unit 18. The superimposition unit 15 acquires inaudible sound data from the control unit 18 under the control of the control unit 18 . The superimposing section 15 superimposes the audible sound data from the delay buffer 14 and the inaudible sound data from the control section 18 under the control of the control section 18 . The superimposing unit 15 outputs to the speaker 16 sound data in which audible sound data and inaudible sound data are superimposed.

The speaker 16 is capable of outputting sound. The speaker 16 is, for example, a loudspeaker that can convert electrical signals into sound. The speaker 16 acquires sound data, which is an electrical signal, from the superimposing section 15 under the control of the control section 18 . Under the control of the control unit 18, the speaker 16 converts the acquired sound data into sound and outputs the sound.

The storage unit 17 is configured to include at least one semiconductor memory, at least one magnetic memory, at least one optical memory, or a combination of at least two of these. The semiconductor memory is, for example, RAM (Random Access Memory) or ROM (Read Only Memory). The RAM is, for example, SRAM (Static Random Access Memory) or DRAM (Dynamic Random Access Memory). The ROM is, for example, an EEPROM (Electrically Erasable Programmable Read Only Memory). The storage unit 17 may function as a main storage device, an auxiliary storage device, or a cache memory. The storage unit 17 stores data used for the operation of the output device 2 and data obtained by the operation of the output device 2. For example, the storage unit 17 stores audio data used by the conversion unit 12 for text-to-speech synthesis.

The control unit 18 is configured to include at least one processor, at least one dedicated circuit, or a combination thereof. The processor is a general-purpose processor such as a CPU (Central Processing Unit) or a GPU (Graphics Processing Unit), or a dedicated processor specialized for specific processing. The dedicated circuit is, for example, an FPGA (Field-Programmable Gate Array) or an ASIC (Application Specific Integrated Circuit). The control unit 18 executes processing related to the operation of the output device 2 while controlling each part of the output device 2 .

The control unit 18 acquires audible sound data from the input unit 11 or the conversion unit 12 via the switch 13. Upon acquiring the audible sound data, the control unit 18 generates inaudible sound data that includes audible sound information as a sound component. In order to execute this process, in this embodiment, the control unit 18 includes an extraction unit 19 and a generation unit 20. Details of the processing by the extraction unit 19 and the generation unit 20 will be described later.

<Sound output processing>
The sound output process executed by the output device 2 will be described below.

First, the user inputs text data of an announcement sound that he wants to output from the output device 2 from the input unit 10. Alternatively, the user inputs data of an audible sound that is an announcement sound that the user wants to output from the output device 2 from the input unit 11 .

When the text data of the announcement sound is input from the input unit 10, the control unit 18 receives the input of the text data from the input unit 10. The control unit 18 causes the conversion unit 12 to convert the received text data into audible sound data. The control unit 18 electrically connects the conversion unit 12, the delay buffer 14, and the extraction unit 19 using the switch 13. With such a configuration, audible sound data converted by the converter 12 is output from the converter 12 to the delay buffer 14 and the extractor 19 via the switch 13.

When audible sound data, which is an announcement sound, is input from the input unit 11, the control unit 18 receives the input of the audible sound data from the input unit 11. When the control section 18 receives input of audible sound data, the control section 18 electrically connects the input section 11 with the delay buffer 14 and the extraction section 19 using the switch 13 . With this configuration, audible sound data received from the input section 11 is outputted from the input section 11 to the delay buffer 14 and the extraction section 19 via the switch 13.

The extraction unit 19 acquires audible sound data from the input unit 11 or the conversion unit 12 via the switch 13. The extraction unit 19 extracts audible sound information to be included in the inaudible sound component output from the output device 2 from the acquired audible sound data. In this embodiment, the extraction unit 19 extracts the fundamental frequency of the audible sound as the audible sound information to be included in the inaudible sound component. In this embodiment, the extraction unit 19 extracts the fundamental frequency from the audible sound data by short-time Fourier transform (STFT). However, the extraction unit 19 may extract the fundamental frequency from the audible sound data using any method. This processing by the extraction unit 19 will be explained below.

The extraction unit 19 divides the audible sound data into frames having a predetermined frame length. For example, the extraction unit 19 divides the audible sound data such that one frame contains sampling data of hundreds to thousands of sounds. The extraction unit 19 divides the audible sound data into frames at intervals of 1/2 to 1/4 of the frame length. This will be explained below with reference to FIG.

The left side of Figure 3 shows a graph of audible sound data. The horizontal axis of this graph is time. In FIG. 3, the frame length is frame length L1. The extraction unit 19 divides the audible sound data into frames every ⅓ of the frame length L1. The extraction unit 19 obtains frames Fr1, Fr2, and Fr3 as shown on the right side of FIG. 3 by dividing the audible sound data into frames every ⅓ of the frame length L1.

The extraction unit 19 extracts the fundamental frequency from the audible sound sampling data included in the frame for each frame. For example, the extraction unit 19 may extract the fundamental frequency based on the period of peaking the autocorrelation function of the audible sound sampling data included in the frame. Alternatively, the extraction unit 19 may extract the fundamental frequency using the cepstral method.

The extraction unit 19 outputs fundamental frequency data extracted for each frame to the generation unit 20.

The generation unit 20 acquires fundamental frequency data for each frame from the extraction unit 19. The generation unit 20 generates inaudible sound data that includes a fundamental frequency, which is audible sound information, as a sound component.

In this embodiment, the generation unit 20 generates a sine wave for each frame based on fundamental frequency data. This sine wave is inaudible sound data. By synthesizing this sine wave as described below, inaudible sound data to be output from the output device 2 is generated. For example, the generation unit 20 generates a sine wave x(t) at time t using equation (1). According to equation (1), a sine wave x(t) having the fundamental frequency f ₀ as a component can be generated.
x(t)=A sin {2πt・(f ₀ +X)} Formula (1)
In equation (1), the amplitude A may be set based on the intensity of audible sound data.
In equation (1), the constant X may be set based on the inaudible range used in the processing system 1. The constant X is, for example, 18 [kHz].

After generating a sine wave for each frame, the generation unit 20 synthesizes the sine waves for each frame to generate inaudible sound data. The generation unit 20 may multiply the sine wave at the end of the frame by a window function, and then synthesize the sine waves for each frame. The window function is, for example, a Hamming window function. By multiplying the sine wave at the end of the frame by a window function, aliasing noise at the end of the frame can be reduced.

After generating the inaudible sound data, the generating unit 20 outputs the generated inaudible sound data to the superimposing unit 15.

The control unit 18 causes the superimposition unit 15 to superimpose the audible sound data from the delay buffer 14 and the inaudible sound data from the generation unit 20 to generate sound data. The control unit 18 causes the superimposition unit 15 to output the generated sound data to the speaker 16. The control unit 18 causes the speaker 16 to convert the sound data from the superimposing unit 15 into sound and output it.

(Sound collector configuration)
As shown in FIG. 4, the sound collector 3 includes a microphone 30, a speaker 31, a communication section 32, a storage section 33, and a control section 34. In FIG. 4, the main flow of data is shown by solid lines.

The microphone 30 is capable of collecting external sounds around the sound collector 3. The microphone 30 includes a left microphone and a right microphone. The left microphone may be included in an earphone unit included in the sound collector 3 that is attached to the user's left ear. The right microphone may be included in the earphone section included in the sound collector 3 and worn on the right side of the user. For example, the microphone 30 is a stereo microphone or the like.

The speaker 31 is capable of outputting sound. The speaker 31 includes a left speaker and a right speaker. The left speaker may be included in an earphone unit included in the sound collector 3 that is attached to the user's left ear. The right speaker may be included in the earphone section included in the sound collector 3 and worn on the right side of the user. For example, the speaker 31 is a stereo speaker or the like.

The communication unit 32 includes at least one communication module that can communicate with the processing device 4 via a communication line. The communication module is a communication module compatible with communication line standards. The standard of the communication line is, for example, a wired communication standard or a short-range wireless communication standard including Bluetooth (registered trademark), infrared rays, NFC (Near Field Communication), and the like.

The storage unit 33 is configured to include at least one semiconductor memory, at least one magnetic memory, at least one optical memory, or a combination of at least two of these. The semiconductor memory is, for example, RAM or ROM. The RAM is, for example, SRAM or DRAM. The ROM is, for example, an EEPROM. The storage unit 33 may function as a main storage device, an auxiliary storage device, or a cache memory. The storage unit 33 stores data used for the operation of the sound collector 3 and data obtained by the operation of the sound collector 3. For example, the storage unit 33 stores system programs, application programs, embedded software, and the like.

The control unit 34 is configured to include at least one processor, at least one dedicated circuit, or a combination thereof. The processor is a general-purpose processor such as a CPU or GPU, or a dedicated processor specialized for specific processing. The dedicated circuit is, for example, an FPGA or an ASIC. The control unit 34 executes processing related to the operation of the sound collector 3 while controlling each part of the sound collector 3.

The control unit 34 includes an acquisition unit 35, a playback unit 36, and a storage unit 37. The storage section 37 is configured to include the same or similar components as the storage section 33. At least a portion of the storage section 37 may be a portion of the storage section 33. The operation of the storage section 37 is executed by the processor of the control section 34 or the like.

The acquisition unit 35 acquires external sound digital data from the external sound analog data collected by the microphone 30. For example, the acquisition unit 35 acquires digital data of external sound by sampling analog data of external sound at a preset sampling rate.

The acquisition unit 35 outputs the digital data of the external sound to the reproduction unit 36. Further, the acquisition unit 35 transmits the digital data of the external sound to the processing device 4 through the communication unit 32 .

When the microphone 30 includes a left microphone and a right microphone, the acquisition unit 35 may acquire digital data of the left external sound from analog data of the external sound collected by the left microphone. Further, the acquisition unit 35 may acquire digital data of the right external sound from analog data of the external sound collected by the right microphone. The acquisition unit 35 may output the left external sound digital data and the right external sound digital data to the playback unit 36 . The acquisition unit 35 may transmit the left external sound digital data and the right external sound digital data to the processing device 4 through the communication unit 32. Hereinafter, unless the left external sound digital data and the right external sound digital data are not particularly distinguished, they will also be simply referred to as "external sound digital data."

The reproduction unit 36 acquires digital data of external sound from the acquisition unit 35. The playback unit 36 receives the replay flag from the processing device 4 through the communication unit 32 .

The replay flag is set to True or False by the processing device 4, which will be described later.

If the replay flag is False, the sound collector 3 and processing device 4 operate in through mode. The through mode is a mode in which the external sound collected by the sound collector 3 is output from the sound collector 3 to the user without going through the processing device 4 . When the sound collector 3 and the like operate in the through mode, the user can hear surrounding external sounds while wearing the sound collector 3.

If the replay flag is True, the sound collector 3 and the processing device 4 operate in playback mode. The reproduction mode is a mode in which reproduction data acquired by the sound collector 3 from the processing device 4 is output from the speaker 31.

If the replay flag is False, the reproduction unit 36 causes the speaker 31 to output the digital data of the external sound acquired from the acquisition unit 35. When the reproduction unit 36 acquires the left and right external sound digital data from the acquisition unit 35, the reproduction unit 36 outputs the left external sound digital data to the left speaker of the speaker 31, and outputs the right external sound digital data. It may be output to the right speaker of the speaker 31.

If the replay flag is True, the playback unit 36 causes the speaker 31 to output the playback data accumulated in the storage unit 37.

The storage unit 37 receives playback data from the processing device 4 through the communication unit 32. The storage unit 37 stores the received reproduction data in the storage unit 37 .

The storage unit 37 may receive the left playback data and the right playback data from the processing device 4 and store them in the storage unit 37. In this case, the playback unit 36 outputs the left playback data stored in the storage unit 37 to the left speaker of the speaker 31, and outputs the right playback data stored in the storage unit 37 to the right speaker of the speaker 31. You can output it.

(Configuration of processing device)
As shown in FIG. 4, the processing device 4 includes a communication section 40, an input section 41, a notification section 42, a storage section 46, a filter section 47, and a control section 50.

The communication unit 40 is configured to include at least one communication module that can communicate with the sound collector 3 via a communication line. The communication module is a communication module compatible with communication line standards. The standard of the communication line is, for example, a wired communication standard or a short-range wireless communication standard including Bluetooth (registered trademark), infrared rays, NFC, and the like.

The input unit 41 can accept input from the user. The input unit 41 includes at least one input interface that can accept input from a user. The input interface is, for example, a physical key, a capacitive key, a pointing device, a touch screen provided integrally with the display of the display unit 43, a microphone, or the like.

The notification unit 42 can notify the user of information. The notification section 42 includes a display section 43, a vibration section 44, and a light emitting section 45. However, the components included in the notification unit 42 are not limited to these. The notification unit 42 may include any component that can notify the user of information.

The display unit 43 can display data. The display unit 43 notifies the user of information according to the data by displaying the data. The display unit 43 is, for example, a display. The display is, for example, an LCD (Liquid Crystal Display) or an organic EL (Electro Luminescence) display.

The vibration unit 44 can vibrate the processing device 4. The vibrating unit 44 notifies the user of information according to the vibration mode by vibrating the processing device 4. The vibrating section 44 includes, for example, a vibrating element such as a piezoelectric element.

The light emitting section 45 is capable of emitting light. The light emitting unit 45 notifies the user of information according to the light emission mode by emitting light. The light emitting unit 45 includes, for example, an LED (Light Emitting Diode).

The storage unit 46 is configured to include at least one semiconductor memory, at least one magnetic memory, at least one optical memory, or a combination of at least two of these. The semiconductor memory is, for example, RAM or ROM. The RAM is, for example, SRAM or DRAM. The ROM is, for example, an EEPROM. The storage unit 46 may function as a main storage device, an auxiliary storage device, or a cache memory. The storage unit 46 stores data used for the operation of the processing device 4 and data obtained by the operation of the processing device 4. For example, the storage unit 46 stores system programs, application programs, embedded software, and the like. For example, the storage unit 46 stores a setting pattern described below.

The filter unit 47 is configured to include, for example, an audible bandpass filter through which only audible sound data can pass, and an inaudible bandpass filter through which only inaudible sound data can pass. . A bandpass filter for the audible range can pass only sound data of, for example, 20 [Hz] to 10 [kHz]. The bandpass filter for the inaudible range can pass sound data of, for example, 18 [kHz] to 22 [kHz]. When the communication unit 40 receives the external sound digital data from the sound collector 3, the control unit 50 divides the received external sound digital data into inaudible sound data and audible sound data using the filter unit 47. Separate. The filter section 47 outputs the inaudible sound data to the section detection section 51 and the pattern detection section 53. The filter section 47 outputs audible sound data to the section buffer 52.

When the digital data of the external sound from the sound collector 3 includes the digital data of the external sound for the left and the right, the filter unit 47 converts the digital data of the external sound for the left into the digital data of the external sound for the left based on the control of the control unit 50. The data can be divided into inaudible sound data for the left side and audible sound data for the left side. Further, the filter unit 47 may divide the digital data of the external sound for the right into inaudible sound data for the right and audible sound data for the right based on the control of the control unit 50. The filter section 47 may output the data of the left and right inaudible sounds to the section detection section 51 and the pattern detection section 53. The filter unit 47 may output left and right audible sound data to the section buffer 52 . Hereinafter, if left audible sound data and right audible sound data are not particularly distinguished, they will also be simply referred to as "audible sound data." Further, when the left inaudible sound data and the right inaudible sound data are not particularly distinguished, they are also simply described as "inaudible sound data."

The control unit 50 is configured to include at least one processor, at least one dedicated circuit, or a combination thereof. The processor is a general-purpose processor such as a CPU or GPU, or a dedicated processor specialized for specific processing. The dedicated circuit is, for example, an FPGA or an ASIC. The control unit 50 executes processing related to the operation of the processing device 4 while controlling each part of the processing device 4 .

The control section 50 includes a section detection section 51, a section buffer 52, a pattern detection section 53, and a reproduction section 54. The section buffer 52 may be part of the storage section 46. The operation of the section buffer 52 is executed by the processor of the control unit 50 or the like.

The section detection section 51 acquires inaudible sound data from the filter section 47. The section detection unit 51 detects a sound section in which a sound continues in the audible sound data based on the inaudible sound data. As described above, the output device 2 generates inaudible sound data that includes audible sound information as a component. Therefore, when audible sound data includes a sound interval, inaudible data corresponding to the sound interval includes sound data other than noise. Therefore, in the present embodiment, the section detection unit 51 detects a sound section by determining whether or not the inaudible sound data includes sound data other than noise. An example of the processing of the section detection unit 51 will be described below.

The section detection unit 51 divides the inaudible sound data into frames having a predetermined frame length. For example, the section detection unit 51 divides the inaudible sound data such that one frame includes sampling data of hundreds to thousands of sounds. The section detection unit 51 divides the inaudible sound data into frames at intervals of 1/2 to 1/4 of the frame length in the same or similar manner to the process described above with reference to FIG.

When the section detection unit 51 divides the inaudible sound data into frames, it determines for each frame whether or not the frame includes sound data other than noise by the following process.

The section detection unit 51 performs Fast Fourier Transform (FFT) on the sound data included in each frame, and obtains a frequency spectrum signal for each frame. This frequency spectrum signal becomes a frequency spectrum signal sg1 as shown in FIG. 5, which will be described later. This frequency spectrum signal corresponds to the spectrum signal of the fundamental frequency component extracted from the audible sound data by the extraction unit 19 of the output device 2. The section detection unit 51 calculates the power of the frequency spectrum signal for each frame by squaring the frequency spectrum signal, for example.

The section detection unit 51 determines whether the power of the frequency spectrum signal for each frame is greater than or equal to the power threshold. The power threshold may be set in advance as a fixed value. Alternatively, the section detection unit 51 may set a power threshold for each frame. In this case, the section detection unit 51 may calculate the power threshold for each frame based on the statistical estimation of the noise power for each frame. The statistical estimation is, for example, an average value or a variance value.

If the section detection unit 51 determines that the power of the frequency spectrum signal is equal to or greater than the power threshold, it determines that the frame corresponding to the frequency spectrum signal includes sound data other than noise. When determining that the power of the frequency spectrum signal is less than the power threshold, the section detection unit 51 determines that the frame corresponding to the frequency spectrum signal does not include sound data other than noise.

The section detection unit 51 detects, as a sound section, a section in which a frame determined to include sound data other than noise continues. The section detection unit 51 may detect a sound section by executing hangover processing.

The interval detection unit 51 may detect the sound interval using other methods. For example, the section detection unit 51 may detect a sound section by separating clusters of noise from clusters of sound data other than noise using a Gaussian Mixture Model (GMM).

The section detection unit 51 may acquire left inaudible sound data and right inaudible sound data from the filter unit 47. In this case, the section detection unit 51 may detect a sound section in which a sound continues in the audible sound data based on both the left and right inaudible sound data. For example, in the same manner as or similar to the above, the section detection unit 51 divides the left inaudible sound data into frames to obtain left frames, and obtains a frequency spectrum signal for each left frame. Furthermore, the section detection unit 51 divides the right inaudible sound data into frames to obtain right frames, and obtains a frequency spectrum signal for each right frame. The same as or similar to the above, the section detection unit 51 determines whether the power of the frequency spectrum signal for each frame for the left and right is equal to or greater than the power threshold. Here, a frame in which the power of the frequency spectrum signal exceeds the power threshold is described as a "True frame." The section detection unit 51 determines whether one of the left and right frames is a True frame. When the section detection unit 51 determines that either one of the left and right frames is a True frame, the inaudible sound frames corresponding to the left and right frames include sound data other than noise. It is determined that Here, the inaudible sound frames corresponding to the left and right inaudible sounds are obtained by considering the left inaudible sound frame and the right inaudible sound frame as one frame. On the other hand, if the section detection unit 51 does not determine that either one of the left and right frames is a True frame, the inaudible sound frames corresponding to the left and right frames contain sound data other than noise. It is determined that it is not included. Same as or similar to the above, the section detection unit 51 detects, as a sound section, a section in which a frame determined to include sound data other than noise continues. However, depending on the settings for the processing device 4, the section detection unit 51 may determine whether both the left frame and the right frame are True frames. In this case, if the section detection unit 51 determines that both the left and right frames are True frames, the inaudible sound frames corresponding to the left and right frames include sound data other than noise. It is determined that the On the other hand, if the section detection unit 51 does not determine that both the left and right frames are True frames, the inaudible sound frames corresponding to the left and right frames include sound data other than noise. It is determined that there is no.

When the section detection unit 51 detects a sound section, it generates a section ID. The section ID is identification information that can uniquely identify a sound section. The section detection section 51 outputs the information on the sound section and the section ID to the section buffer 52 and the pattern detection section 53. The sound interval information includes information on the start point and end point of the sound interval. The start point and end point of the sound section are specified, for example, by time or the like.

The section buffer 52 acquires audible sound data from the filter section 47. Furthermore, the section buffer 52 acquires information on the sound section and the section ID from the section detection section 51. The section buffer 52 extracts audible sound data included in the sound section from among the audible sound data obtained from the filter section 47 based on the sound section information obtained from the section detection section 51. The section buffer 52 stores audible sound data of the extracted sound section in association with the section ID.

The section buffer 52 may acquire left and right audible sound data from the filter unit 47. In this case, the section buffer 52 may extract and hold audible sound data for the left and right sections of the sound section.

When a predetermined period of time has elapsed since the section buffer 52 started holding the audible sound data of the sound section, the section buffer 52 may delete the held audible sound data of the sound section. The predetermined time may be set based on the amount of data that can be held in the section buffer 52.

The pattern detection unit 53 acquires inaudible sound data from the filter unit 47. Furthermore, the pattern detection unit 53 acquires information on the sound interval and the interval ID from the interval detection unit 51.

When the pattern detection unit 53 acquires the sound interval information and the interval ID from the interval detection unit 51, it extracts the inaudible sound data of the sound interval from among the inaudible sound data acquired from the filter unit 47.

The pattern detection unit 53 determines whether the inaudible sound component of the extracted sound section satisfies preset conditions. In this embodiment, this condition is that at least a portion of the frequency spectrum signal of the inaudible sound data in the sound interval matches a setting pattern described below. The processing of the pattern detection section 53 will be explained below.

The pattern detection unit 53 divides the extracted inaudible sound data of the sound section into frames having a predetermined frame length. For example, the pattern detection unit 53 divides the inaudible sound data such that one frame contains sampling data of hundreds to thousands of sounds. The pattern detection unit 53 divides the inaudible sound data into frames at intervals of 1/2 to 1/4 of the frame length in the same or similar manner to the process described above with reference to FIG.

The pattern detection unit 53 performs fast Fourier transform on the digital sound data included in each frame, and obtains a frequency spectrum signal for each frame. This frequency spectrum signal corresponds to the spectrum signal of the fundamental frequency component extracted from the audible sound data by the extraction unit 19 of the output device 2.

The pattern detection unit 53 determines whether the shape of at least a portion of the frequency spectrum signal of the inaudible sound data in the sound interval matches the set pattern. The pattern detection unit 53 may determine whether the shape of at least a portion of the frequency spectrum signal of the inaudible sound data in the sound interval matches the set pattern by calculating a two-dimensional cross-correlation.

The setting pattern is generated based on preset audible sound data, for example. For example, the setting pattern is generated based on a preset fundamental frequency of an audible sound, the length of the audible sound, and the sound speed of the audible sound. The setting pattern may be generated based on the shape of a spectral signal of a fundamental frequency component when preset audible sound data is converted into a spectral signal of fundamental frequency components. For example, assume that the output device 2 is used in a waiting room at an airport. Further, it is assumed that the user does not want to miss the information regarding "Flight 153." In this case, the setting pattern is generated based on the audible sound data of "flight 153" outputted by the output device 2.

For example, FIG. 5 shows a graph of the frequency spectrum signal sg1. The frequency spectrum signal sg1 is obtained by the pattern detection unit 53 performing fast Fourier transform on inaudible sound data. In FIG. 5, the horizontal axis indicates time. The vertical axis indicates frequency. The upper part of FIG. 5 shows digital data of audible sounds for reference. In FIG. 5, the section from time t1 to time t2 is a sound section. The pattern detection unit 53 determines that the frequency spectrum signal sg2 included in the dotted line portion of the frequency spectrum signal sg1 matches the set pattern.

When the pattern detection unit 53 determines that the shape of at least a portion of the frequency spectrum signal of the inaudible sound in the sound interval matches the set pattern, the pattern detection unit 53 outputs a reproduction trigger and the interval ID to the reproduction unit 54.

The pattern detection unit 53 may acquire left inaudible sound data and right inaudible sound data from the filter unit 47. In this case, the pattern detection unit 53 may determine whether both components of the left inaudible sound and the right inaudible sound in the sound section satisfy a preset condition.

The playback unit 54 acquires the playback trigger and section ID from the pattern detection unit 53. The playback unit 54 executes playback processing upon acquiring the playback trigger. In the playback process, the playback unit 54 acquires audible sound data associated with the section ID from the section buffer 52. The reproduction unit 54 sets the replay flag to True, and transmits the acquired audible sound data to the sound collector 3 as reproduction data through the communication unit 40. After transmitting all the reproduction data to the sound collector 3, the reproduction unit 54 sets the replay flag to False.

When the section buffer 52 holds left and right audible sound data of the sound section, the playback unit 54 may transmit the left audible sound data to the sound collector 3 as left playback data. Furthermore, the reproduction unit 54 may transmit data of the audible sound for the right to the sound collector 3 as reproduction data for the right.

<Notification processing>
When the pattern detection section 53 determines that the inaudible sound component of the sound section satisfies the preset condition, the control section 50 notifies the user through the notification section 42 that a sound satisfying the condition has been detected.

As an example, the control unit 50 may cause the display unit 43 to display the text corresponding to the set pattern and the detection time at which the sound satisfying the condition was detected. The text corresponding to the setting pattern is a verbalization of the audible sound used to generate the setting pattern. For example, if the text corresponding to the setting pattern is "Flight 512" and the detection time is 11:11, the control unit 50 causes the display unit 43 to display the information "Flight 512, detection time 11:11". .

As another example, the control unit 50 may notify the user that a sound satisfying the condition has been detected by causing the vibration unit 44 to signal the processing device 4. As yet another example, the control unit 50 may notify the user that a sound satisfying the condition has been detected by causing the light emitting unit 45 to emit light.

After notifying the user, the control unit 50 may receive the reproduction instruction through the input unit 41. The reproduction instruction is when the text and detection time are displayed on the display section 43, and when the input section 41 is a touch screen provided integrally with the display of the display section 43, the reproduction instruction is given by displaying the text and the detection time on the display section 43. It may be a touch operation to the time. Upon receiving the playback instruction, the control unit 50 outputs the text for which the touch operation was received, the section ID corresponding to the detection time, and a playback trigger to the playback unit 54 . Upon acquiring the section ID and the reproduction trigger, the reproduction unit 54 executes the reproduction process as described above.

(Operation of output device)
FIG. 6 is a flowchart showing the flow of sound output processing executed by the output device 2 shown in FIG. Hereinafter, it is assumed that the user inputs text data from the input unit 10. When the user inputs text data from the input unit 10, the control unit 18 starts processing in step S1.

The control unit 18 receives input of text data through the input unit 10 (step S1). The control unit 18 causes the conversion unit 12 to convert the text data received in the process of step S1 into audible sound data (step S2). The control unit 18 electrically connects the conversion unit 12, the delay buffer 14, and the extraction unit 19 using the switch 13.

Upon acquiring the audible sound data from the switch 13, the extraction unit 19 divides the acquired audible sound data into frames (step S3). The extraction unit 19 extracts the fundamental frequency from the audible sound sampling data included in the frame for each frame (step S4).

When the generation unit 20 acquires the fundamental frequency data for each frame from the extraction unit 19, it generates a sine wave for each frame based on the fundamental frequency data (step S5). After generating the sine wave, the generation unit 20 synthesizes the sine waves for each frame to generate inaudible sound data (step S6).

The control unit 18 causes the superimposition unit 15 to superimpose the audible sound data from the delay buffer 14 and the inaudible sound data from the generation unit 20 to generate sound data (step S7). The control unit 18 converts the sound data generated in the process of step S7 into sound and outputs the sound using the speaker 31 (step S8).

In the process of step S1, the control unit 18 may receive input of audible sound data through the input unit 11. In this case, in the process of step S12, the control section 18 electrically connects the input section 11, the delay buffer 14, and the extraction section 19 using the switch 13.

(Operation of sound collector)
FIG. 7 is a flowchart showing the flow of sound collection processing performed by the sound collector 3 shown in FIG. For example, when the communication unit 32 receives an instruction to start the sound collection process from the processing device 4, the control unit 34 starts the process of step S11.

The control unit 34 determines whether the replay flag from the processing device 4 is True (step S11). When the control unit 34 determines that the replay flag is True (step S11: YES), the process proceeds to step S12. When the control unit 34 determines that the replay flag is False (step S11: NO), the process proceeds to step S13.

In the process of step S12, the control unit 34 controls to operate in the playback mode. The playback section 36 causes the speaker 31 to output the playback data accumulated in the storage section 37 .

In the process of step S13, the control unit 34 controls to operate in through mode. The reproduction unit 36 causes the speaker 31 to output the digital data of the external sound acquired from the acquisition unit 35.

After the process of step S12 or step S13, the control unit 34 returns to the process of step S11.

(Operation of processing device)
8 and 9 are flowcharts showing the flow of the sound acquisition process executed by the processing device 4 shown in FIG. 4. For example, when the transmission of external sound digital data from the sound collector 3 to the processing device 4 is started, the control unit 50 starts the process of step S21.

The control unit 50 receives digital data of external sound from the sound collector 3 through the communication unit 40 (step S21). The control unit 50 uses the filter unit 47 to separate the received external sound digital data into inaudible sound data and audible sound data (step S22).

The section detection unit 51 divides the inaudible sound data acquired from the filter unit 47 into frames (step S23). The section detection unit 51 detects a sound section by determining whether the frame includes sound data other than noise (step S24).

The section buffer 52 extracts audible sound data included in the sound section from among the audible sound data obtained from the filter section 47 based on the sound section information obtained from the section detection section 51. The section buffer 52 stores the audible sound data of the extracted sound section in association with the section ID (step S25).

The pattern detection unit 53 divides the inaudible sound data of the sound section detected in the process of step S24 into frames (step S26).

The pattern detection unit 53 performs fast Fourier transform on the digital sound data included in each frame, and obtains a frequency spectrum signal for each frame (step S27).

The pattern detection unit 53 determines whether the shape of at least a portion of the frequency spectrum signal of the inaudible sound data in the sound interval matches the set pattern (step S28).

If the pattern detection unit 53 determines that the shape of at least a portion of the frequency spectrum signal of the inaudible sound in the sound interval matches the set pattern (step S28: YES), the processing device 4 proceeds to the process of step S29. On the other hand, if the pattern detection unit 53 does not determine that the shape of at least part of the frequency spectrum signal of the inaudible sound in the sound interval matches the set pattern (step S28: NO), the processing device 4 returns to the process of step S21. .

In the process of step S29, upon acquiring the reproduction trigger and section ID from the pattern detection unit 53, the reproduction unit 54 sets the replay flag to True. The reproduction unit 54 acquires audible sound data associated with the interval ID from the interval buffer 52, and transmits the acquired audible sound data as reproduction data to the sound collector 3 by the communication unit 40 (step S30).

The reproduction unit 54 determines whether all reproduction data has been transmitted to the sound collector 3 (step S31). When determining that all the reproduction data has been transmitted to the sound collector 3 (step S31: YES), the reproduction unit 54 sets the replay flag to False (step S32). If the reproduction unit 54 does not determine that all the reproduction data has been transmitted to the sound collector 3 (step S31: NO), the process returns to step S30.

In the process of step S33, the control unit 50 uses the notification unit 42 to notify the user that a sound satisfying the preset conditions has been detected.

The control unit 50 determines whether a reproduction instruction is received by the input unit 41 (step S34). When the control unit 50 determines that the reproduction instruction has been received by the input unit 41 (step S34: YES), the process proceeds to step S35 as shown in FIG. On the other hand, if the control unit 50 does not determine that the reproduction instruction has been received by the input unit 41 (step S34: NO), it repeatedly executes the process of step S34. The control unit 50 may end the sound acquisition process if a predetermined period of time has elapsed while repeatedly performing the process of step S34. The predetermined time may be set based on the specifications of the processing device 4.

The control unit 50 executes steps S35, S36, S37, and S38 as shown in FIG. 9 in the same manner as or similar to steps S29, S30, S31, and S32 as shown in FIG. After the process in step S38, the control unit 50 returns to the process in step S21.

As described above, in the present embodiment, the control unit 18 of the output device 2 receives text data of audible sounds such as announcement sounds output from the output device 2 from the input unit 10. Alternatively, the control unit 18 receives data of an audible sound such as an announcement sound output from the output device 2 from the input unit 11 . The control unit 18 generates an inaudible sound containing the received audible sound information as a component, and the speaker 16 outputs a sound in which the audible range sound and the inaudible range sound are superimposed. Inaudible sounds cannot be heard by the average person. However, some people can hear inaudible sounds and may find them unpleasant. In this embodiment, by outputting a sound in which an audible sound and a non-audible sound are superimposed from the speaker 16, it is possible to reduce the possibility that a person who can hear the non-audible sound will find the non-audible sound unpleasant.

Furthermore, in the present embodiment, the control unit 50 of the processing device 4 acquires external sound data by having the communication unit 40 receive the external sound data via the sound collector 3 . The external sound data may include announcement sound data output by the output device 2. The control unit 50 detects an audible sound such as an announcement sound outputted by the output device 2 from among the external sounds based on the inaudible sound component of the external sound. Here, inaudible sounds are rare in nature compared to audible sounds. In other words, most of the noise is audible. Therefore, even if the external sound includes noise, the control unit 50 can accurately detect the audible sound output by the output device 2 based on the inaudible sound component of the external sound.

Therefore, according to the present embodiment, it is possible to provide a technique for detecting sound with high accuracy.

Furthermore, in the present embodiment, if the inaudible sound component of the external sound satisfies a preset condition, the control unit 50 of the processing device 4 may notify the user that a sound satisfying the condition has been detected. good. With such a configuration, the user can know that a sound satisfying the condition has been detected.

In the present embodiment, when the input unit 41 receives a playback instruction after notifying the user that a sound satisfying the condition has been detected, the control unit 50 of the processing device 4 detects a sound in the audible range of the sound interval. A playback process for playing back the data may also be executed. For example, the playback unit 54 may transmit the data of the audible range sound in the sound section to the sound collector 3 as playback data, and the sound collector 3 may play the data. Here, users rarely pay attention to the announcement sound output by the output device 2 in an airport waiting room, a station premises, or the like. By reproducing the sound data in the audible range of the sound section, the user can listen again to the missed announcement sound.

Furthermore, in the present embodiment, the control unit 50 of the processing device 4 executes a reproduction process for reproducing the audible range of the external sound when the inaudible sound component of the external sound satisfies a preset condition. Good too. For example, upon acquiring the reproduction trigger and section ID from the pattern detection section 53, the reproduction section 54 may acquire audible sound data corresponding to the section ID from the section buffer 52. The reproduction unit 54 may transmit the acquired audible sound data to the sound collector 3 and have the sound collector 3 reproduce it. With such a configuration, the user can quickly hear the sound.

(Other embodiments)
Processing systems according to other embodiments will be described below. A processing system according to another embodiment includes an output device 2 as shown in FIG. 2, a sound collector 3 as shown in FIG. 4, and a processing device 104 as shown in FIG. 10, which will be described later.

(Configuration of output device)
An output device 2 according to another embodiment will be described with reference to FIG. 2.

The generation unit 20 according to another embodiment generates a sine wave based on fundamental frequency data for each frame, the same as or similar to the above-described embodiment. However, in other embodiments, the generation unit 20 generates the sine wave x(t) at time t using equation (2), for example. A sine wave x(t) having the fundamental frequency f ₀ as a component can be generated by equation (2).
x(t)=A sin {2πt・(n・f ₀ +X)} Formula (2)
In formula (2), the numerical value n is a numerical value of 1.0 or more. The numerical value n is set within a range in which (n·f ₀ +X) does not exceed the sampling rate of the sound collector 3. The numerical value n may be set based on the resolution of the fast Fourier transform performed by the processing device 104. For example, the lower the resolution of the fast Fourier transform, the larger the numerical value n may be set.

After generating a sine wave for each frame, the generation unit 20 synthesizes the sine waves for each frame to generate inaudible sound data in the same manner as or similar to the embodiment described above. When generating the inaudible sound data, the generating unit 20 outputs the generated inaudible sound data to the superimposing unit 15 in the same manner as or similar to the embodiment described above.

The control unit 18 causes the superimposition unit 15 to superimpose the audible sound data from the delay buffer 14 and the inaudible sound data from the generation unit 20 to generate sound data, in the same way or similar to the embodiment described above. let However, in other embodiments, when superimposing the audible sound data and the inaudible sound data, the control unit 18 corrects the phase shift between the audible sound data and the inaudible sound data by phase adjustment. You may. Further, the output device 2 may further include a high pass filter (HPF) for removing noise in the audible range that occurs when audible sound data and inaudible sound data are superimposed.

The control unit 18 causes the speaker 16 to convert the sound data from the superimposition unit 15 into sound and output it, in the same way or similar to the embodiment described above.

The other configurations of the output device 2 according to the other embodiments are the same as or similar to the configuration of the output device 2 according to the embodiment described above.

(Configuration of processing device)
As shown in FIG. 10, a processing device 104 according to another embodiment includes a communication section 40, an input section 41, a notification section 42, a storage section 46, a filter section 47, and a control section 150.

The storage unit 46 according to another embodiment stores keywords set in advance by the user. The user sets keywords related to information that the user does not want to miss. For example, if the user does not want to miss information regarding "Flight 153," the user sets a keyword "Flight 153" and stores it in the storage unit 46.

The storage unit 46 according to another embodiment stores information about the constant X and the integer n in equation (2).

The control unit 150 is configured to include at least one processor, at least one dedicated circuit, or a combination thereof. The processor is a general-purpose processor such as a CPU or GPU, or a dedicated processor specialized for specific processing. The dedicated circuit is, for example, an FPGA or an ASIC. The control unit 150 executes processing related to the operation of the processing device 4 while controlling each part of the processing device 104 .

The control unit 150 includes a section detection section 51, a section buffer 52, a playback section 54, a buffer 55, a first extraction section 56, a second extraction section 57, a removal section 58, an emphasis section 59, and a conversion section 58. It has a section 60 and a recognition section 61. Buffer 55 may be part of storage unit 46 . The operation of the buffer 55 is executed by the processor of the control unit 150 or the like.

The control unit 150 receives external sound data from the sound collector 3 through the communication unit 40 in the same manner as or similar to the embodiment described above. The control unit 150 inputs the received external sound data to the filter unit 47 . The filter unit 47 separates the external sound data into inaudible sound data and audible sound data in the same manner as or similar to the embodiment described above. The filter section 47 outputs the inaudible sound data to the section detection section 51. The filter section 47 outputs the audible sound data to the section buffer 52 and the buffer 55.

The section detection unit 51 acquires inaudible sound data from the filter unit 47 in the same manner as or similar to the embodiment described above. The section detection unit 51 detects a sound section in which a sound continues in the audible sound data, based on the inaudible sound data, as in or similar to the embodiment described above. The section detection unit 51 generates a section ID when detecting a sound section, in the same way or similar to the embodiment described above. The section detection unit 51 outputs sound section information and section ID to the section buffer 52 in the same manner as or similar to the embodiment described above. In another embodiment, the section detection unit 51 outputs the information on the sound section and the section ID to the buffer 55.

In another embodiment, the section detecting section 51 extracts inaudible sound data in a sound section from among the inaudible sound data acquired from the filter section 47. The section detection section 51 outputs the inaudible sound data and section ID of the extracted sound section to the first extraction section 56 .

The section detection unit 51 may acquire left inaudible sound data and right inaudible sound data from the filter unit 47. In this case, the section detection unit 51 extracts the inaudible sound data for the left side of the sound section from the data of the inaudible sound for the left side, and extracts the inaudible sound data for the right side of the sound section from the data of the right inaudible sound. May extract data. The section detecting section 51 may output the left and right inaudible sound data and section ID of the extracted sound section to the first extracting section 56 .

The section buffer 52 acquires audible sound data from the filter section 47 and obtains sound section information and section ID from the section detection section 51, in the same way or similar to the embodiment described above. Same as or similar to the embodiment described above, the section buffer 52 selects data that may be included in the sound section from among the audible sound data obtained from the filter section 47 based on the information on the sound section obtained from the section detection section 51. Extract auditory data. The section buffer 52 stores audible sound data of the extracted sound section in association with the section ID.

In another embodiment, the section buffer 52 outputs the audible sound data and section ID included in the extracted sound section to the removal unit 58. When the section buffer 52 extracts the left and right audible sound data of the sound section, it may output the left and right audible sound data of the sound section and the section ID to the removal unit 58 .

The buffer 55 acquires audible sound data from the filter section 47. The buffer 55 also acquires information on the sound interval and the interval ID from the interval detection unit 51. The buffer 55 extracts audible sound data included in sections other than the sound section from the audible sound data obtained from the filter section 47 based on the sound section information obtained from the section detection section 51. The buffer 55 holds data of audible sounds included in intervals other than the extracted sound interval.

The buffer 55 outputs the audible sound data and the section ID included in sections other than the extracted sound section to the second extraction unit 57.

The buffer 55 may acquire left and right audible sound data from the filter unit 47. In this case, the buffer 55 may extract left and right audible sound data included in intervals other than the sound interval. The buffer 55 may output data of left and right audible sounds included in a section other than the sound section and the section ID to the second extraction section 57.

The first extraction unit 56 acquires the inaudible sound data and the interval ID of the sound interval from the interval detection unit 51. The first extraction unit 56 extracts the fundamental frequency of the audible sound in the sound interval from the acquired data on the inaudible sound in the sound interval. As described above, the output device 2 generates inaudible sound data including information on the fundamental frequency of the audible sound using equation (2). Therefore, the inaudible data of a sound interval includes information about the fundamental frequency of the sound interval. An example of a process for extracting the fundamental frequency of an audible sound will be described below.

The first extraction unit 56 divides the inaudible sound data into frames having a predetermined frame length in the same way or similar to the process described above with reference to FIG. 3. When the inaudible sound data is divided into frames, the first extraction unit 56 performs a fast Fourier transform on the sound data included in each frame in the same or similar manner as the process of the section detection unit 51, and generates a frequency spectrum signal for each frame. get. For example, the first extraction unit 56 performs fast Fourier transform on several thousand samples of sound data every several hundred samples.

Upon acquiring the frequency spectrum signal for each frame, the first extraction unit 56 calculates the power of the frequency spectrum signal for each frame, for example, by squaring the frequency spectrum signal. The first extraction unit 56 extracts, from among the frequency spectrum signals for each frame, a frequency spectrum signal in which the power of the frequency spectrum signal is greater than or equal to a power threshold. The power threshold value may be the same as the power threshold value used by the section detection section 51, or may be set separately from the power threshold value used by the section detection section 51.

When the first extraction unit 56 extracts a frequency spectrum signal whose power is equal to or greater than the power threshold, the first extraction unit 56 extracts fundamental frequency information from the extracted frequency spectrum signal. For example, the first extracting unit 56 extracts the fundamental frequency f ₀ in equation (2) using the extracted frequency spectrum signal and information on the constant X and integer n in equation (2) stored in the storage unit 46. . The fundamental frequency extracted by the first extraction unit 56 in this manner becomes the fundamental frequency of the audible sound output by the output device 2.

The first extracting unit 56 outputs the extracted fundamental frequency information and section ID to the emphasizing unit 59.

The first extraction unit 56 may acquire the left and right inaudible sound data and the interval ID from the interval detection unit 51. In this case, the first extraction unit 56 may extract fundamental frequency information from either the left or right inaudible sound data of the sound section.

The second extraction unit 57 acquires the audible sound data and the section ID included in sections other than the sound section from the buffer 55. The second extraction unit 57 extracts noise component information based on audible sound data included in an interval other than the acquired sound interval. Here, the possibility that the audible sound data included in the interval other than the sound interval is the audible sound data output from the output device 2 is low. Rather, the audible sound data included in intervals other than the sound interval is highly likely to be noise data. Therefore, the second extraction unit 57 extracts noise component information based on audible sound data included in intervals other than the sound interval. An example of a process for extracting noise will be described below.

The second extraction unit 57 divides the audible sound data included in a section other than the sound section into frames having a predetermined frame length, in the same way or similar to the process described above with reference to FIG. 3. When the audible sound data is divided into frames, the second extraction unit 57 performs a fast Fourier transform on the sound data included in each frame in the same or similar manner as the processing by the section detection unit 51, and generates a frequency spectrum signal for each frame. get. For example, the second extraction unit 57 performs fast Fourier transform on several thousand samples of sound data every several hundred samples.

The second extraction unit 57 acquires the frequency spectrum signal for each frame as a noise frequency spectrum signal. In other words, the second extraction unit 57 acquires the frequency spectrum signal of the noise as information on the noise component. The second extractor 57 outputs the noise frequency spectrum signal and the section ID to the remover 58.

The second extraction unit 57 may acquire, from the buffer 55, the data of left and right audible sounds included in a section other than the sound section, and the section ID. In this case, the second extraction unit 57 extracts the information on the left noise component from the left audible sound data included in the section other than the sound section, and extracts the information on the left audible sound component from the data of the right audible sound included in the section other than the sound section. Information on the noise component for the right side may be extracted from . The second extractor 57 may output the extracted left and right noise component information, for example, the noise frequency spectrum signal and the section ID, to the removal unit 58.

The removal unit 58 acquires the noise frequency spectrum signal and the section ID from the second extraction unit 57 as noise component information. The removal unit 58 acquires the section ID and audible sound data from the section buffer 52. The removal unit 58 removes the noise component from the audible sound data acquired from the section buffer 52 based on the information on the noise component. An example of processing for removing noise components will be described below.

The removing unit 58 divides the audible sound data of the sound section into frames having a predetermined frame length in the same way or similar to the process described above with reference to FIG. 3. When the audible sound data is divided into frames, the removal unit 58 performs a fast Fourier transform on the sound data included in each frame in the same or similar manner as the process performed by the section detection unit 51, and obtains a frequency spectrum signal for each frame. . For example, the removing unit 58 performs fast Fourier transform on several thousand samples of sound data every several hundred samples.

The removal unit 58 removes noise components from the frequency spectrum signal of the audible sound for each frame based on the frequency spectrum signal of the noise acquired from the second extraction unit 57. The removal unit 58 may remove noise components from the frequency spectrum signal of the audible sound for each frame using any method such as a spectral subtraction method or a Wiener filter.

The removing unit 58 outputs the frequency spectrum signal of the audible sound from which the noise component has been removed and the section ID to the emphasizing unit 59.

The removing unit 58 may obtain left and right audible sound data and the section ID of the sound section from the section buffer 52. The removal unit 58 may acquire information on the left and right noise components and the section ID from the second extraction unit 57. In this case, the removal unit 58 may remove the left noise component from the left audible sound data of the sound section, and may remove the right noise component from the right audible sound data of the sound section. The removal unit 58 may output left and right audible sound data from which noise components have been removed, such as frequency spectrum signals, and the section ID to the emphasizing unit 59.

The emphasizing unit 59 acquires the frequency spectrum signal of the audible sound from which the noise component has been removed and the section ID from the removing unit 58. The emphasizing unit 59 acquires the fundamental frequency information and the section ID from the first extracting unit 56 . The emphasizing unit 59 emphasizes the frequency spectrum signal of the audible sound outputted by the output device 2 from among the frequency spectrum signal of the audible sound acquired from the removing unit 58 based on the fundamental frequency information acquired from the first extracting unit 56. do. Here, the frequency of the vocal cord sound source emitted from the vocal cords in the throat of a typical person is the fundamental frequency. Furthermore, a typical human voice is generated by a spectral envelope using the vocal tract as a vocal tract filter based on the fundamental frequency of the vocal cord sound source. That is, a typical human voice is generated by convolution of an integer multiple of the fundamental frequency, that is, overtones, and a vocal tract filter. Therefore, the emphasizing unit 59 increases the power [dB] of a frequency that is an integral multiple of the fundamental frequency acquired from the first extracting unit 56 out of the frequency spectrum signal of the audible sound acquired from the removing unit 58. As described above, the fundamental frequency acquired from the first extractor 56 becomes the fundamental frequency of the sound output by the output device 2. By increasing the power [dB] of frequencies that are integral multiples of the fundamental frequency in this way, the frequency spectrum signal of the audible sound output by the output device 2 can be emphasized.

As an example of the emphasizing process, the emphasizing unit 59 calculates the power [dB] in the frequency range of (m×f ₀ ±F) as shown in FIG. increase. In FIG. 11, the horizontal axis indicates frequency. The vertical axis indicates the power gain of the frequency spectrum signal. The integer m is an integer satisfying 1≦m≦M. The integer M, which is the upper limit of the integer m, may be set depending on the environment in which the processing system 1 is used. The fundamental frequency f ₀ is the fundamental frequency of the audible sound extracted by the first extraction unit 56. The constant F is set based on the frequency fluctuation of the audible sound output from the output device 2, etc. In FIG. 11, the power gain at the frequency (m×f ₀ ) is set to B times (B satisfies 1<B). The power gain at each frequency (m×f ₀ ±F) is set to 1 times. The power gain is set so that the power gain linearly attenuates from frequency (m×f ₀ ) to frequency (m×f ₀ ±F).

After emphasizing the frequency spectrum signal of the audible sound outputted by the output device 2, the emphasizing unit 59 outputs the frequency spectrum signal of the audible sound after the emphasis and the section ID to the converting unit 60.

The emphasizing unit 59 may acquire the frequency spectrum signals of the left and right audible sounds from which noise components have been removed and the section ID from the removing unit 58. In this case, the emphasizing unit 59 may emphasize each of the left and right audible sound frequency spectrum signals. The emphasizing section 59 may output the emphasized left and right audible sound frequency spectrum signals and the section ID to the converting section 60 .

The converting unit 60 acquires the frequency spectrum signal and the section ID of the audible sound from the emphasizing unit 59. The converter 60 converts the audible sound frequency spectrum signal into time domain audible sound data by inverse short time Fourier transform (ISTFT). The conversion unit 60 outputs the converted time domain audible sound data and the section ID to the playback unit 54 and the recognition unit 61.

The converting unit 60 may obtain the frequency spectrum signals of the left and right audible sounds and the section ID from the emphasizing unit 59. In this case, the conversion unit 60 converts the frequency spectrum signal of the left audible sound into time-domain left audible sound data, and converts the frequency spectrum signal of the right audible sound into time-domain right audible sound data. You can convert it to The converter 60 may output left and right audible sound data in the time domain to the playback unit 54 and the recognition unit 61 .

The recognition unit 61 acquires time-domain audible sound data and section ID from the conversion unit 60. The recognition unit 61 acquires audible text data by performing speech recognition processing on the acquired audible sound data. The recognition unit 61 determines whether the keyword stored in the storage unit 46 is included in the acquired audible sound text data. If the recognition unit 61 determines that the audible sound text data includes a keyword, it outputs a reproduction trigger and a section ID to the reproduction unit 54 .

The recognition unit 61 may acquire left and right audible sound data in the time domain from the conversion unit 60, and may acquire text data of left and right audible sounds. In this case, if the recognition unit 61 determines that the keyword stored in the storage unit 46 is included in either the left or right audible sound text data, the recognition unit 61 outputs the reproduction trigger and section ID to the reproduction unit 54. You may do so.

The playback unit 54 acquires the converted time domain audible sound data and the section ID from the conversion unit 60. The playback unit 54 acquires the playback trigger and section ID from the recognition unit 61. When the reproduction unit 54 acquires a reproduction trigger from the recognition unit 61, it executes reproduction processing. In the playback process according to another embodiment, the playback unit 54 sets the replay flag to True. Among the audible sound data obtained from the conversion section 60, the playback section 54 uses, as playback data, audible sound data having the same section ID as the section ID obtained from the recognition section 61 together with the playback trigger. The reproduction section 54 transmits reproduction data to the sound collector 3 through the communication section 40 . After transmitting all the reproduction data to the sound collector 3, the reproduction unit 54 sets the replay flag to False.

The playback unit 54 may obtain the left and right audible sound data and the section ID in the time domain from the conversion unit 60. In this case, the playback unit 54 may use left and right audible sound data in the time domain as playback data.

The other configuration of the processing device 104 is the same as or similar to the configuration of the processing device 4 according to the embodiment described above.

(Operation of output device)
The operation of the output device 2 according to another embodiment will be explained using the flowchart shown in FIG. However, in the process of step S5, the generation unit 20 generates a sine wave using equation (2).

(Operation of sound collector)
The operation of the sound collector 3 according to another embodiment will be explained by the flowchart shown in FIG.

(Operation of processing device)
FIG. 12 is a flowchart showing the flow of sound acquisition processing executed by the processing device 104 according to another embodiment. For example, when the transmission of external sound digital data from the sound collector 3 to the processing device 104 is started, the control unit 150 starts the process of step S41.

The processing device 104 executes the processes of steps S41 to S44 in the same or similar manner as the processes of steps S21 to S44 shown in FIG.

In the process of step S45, the first extraction unit 56 acquires the inaudible sound data and the interval ID of the sound interval from the interval detection unit 51. The first extraction unit 56 extracts the fundamental frequency of the audible sound in the sound interval from the acquired data on the inaudible sound in the sound interval.

In the process of step S46, the second extraction unit 57 acquires the audible sound data and the section ID included in the section other than the sound section from the buffer 55. The second extraction unit 57 extracts noise component information based on audible sound data included in an interval other than the acquired sound interval.

In the process of step S47, the removal unit 58 acquires the section ID and audible sound data from the section buffer 52. The removal unit 58 removes the noise component from the audible sound data obtained from the section buffer 52 based on the noise component information obtained in the process of step S46.

In the process of step S48, the emphasizing unit 59 selects a sound with a frequency that is an integer multiple of the fundamental frequency extracted in the process of step S45, out of the frequency spectrum signal of the audible sound from which the noise component has been removed by the process of step S47. Strengthen your power.

In the process of step S49, the conversion unit 60 converts the frequency spectrum signal of the audible sound on which the process of step S48 was executed into time domain audible sound data. The recognition unit 61 acquires audible text data by performing speech recognition processing on time domain audible sound data.

In the process of step S50, the recognition unit 61 determines whether the keyword stored in the storage unit 46 is included in the audible sound text data acquired in the process of step S49. If the recognition unit 61 determines that the audible sound text data includes a keyword (step S50: YES), the process proceeds to step S29 shown in FIG. 8. If the recognition unit 61 determines that the audible sound text data does not include a keyword (step S50: NO), the processing device 104 returns to the process of step S41.

As described above, in the processing device 104 according to another embodiment, the first extraction unit 56 of the control unit 150 extracts fundamental frequency information from the data of the sound in the inaudible range of the sound interval. Furthermore, the emphasizing unit 59 of the control unit 150 increases the power of sounds with frequencies that are integral multiples of the extracted fundamental frequency, among the audible sounds in the sound section. With such a configuration, the power of the audible sound output by the output device 2 among the audible sounds collected by the sound collector 3 can be increased. By increasing the power of the audible sound output by the output device 2, the influence of noise on the audible sound is reduced. Since the influence of noise on the audible sound is reduced, when the data of the audible sound is reproduced, the user can accurately hear the sound such as the announcement sound outputted by the output device 2. Furthermore, since the influence of noise on the audible sound is reduced, the possibility that the text data of the audible sound converted by the recognition unit 61 matches the text data of the audible sound such as the announcement sound outputted by the output device 2 increases. . With such a configuration, the recognition unit 61 can accurately determine whether or not the audible sound text data includes preset text data.

Furthermore, in other embodiments, the second extraction unit 57 of the control unit 150 may extract information on noise components based on data of sounds in the audible range included in sections other than the sound sections. Further, the removal unit 58 of the control unit 150 may remove noise components from the audible sound data of the sound section. Such a configuration reduces the influence of noise on audible sounds. Since the influence of noise on the audible sound is reduced, when the audible sound is reproduced as reproduction data, the user can accurately hear the sound such as the announcement sound outputted by the output device 2.

Although the present disclosure has been described based on the drawings and examples, it should be noted that those skilled in the art can easily make various changes or modifications based on the present disclosure. It should therefore be noted that these variations or modifications are included within the scope of this disclosure. For example, the functions included in each functional section can be rearranged so as not to be logically contradictory. A plurality of functional units etc. may be combined into one or may be divided. Each embodiment according to the present disclosure described above is not limited to being implemented faithfully to each described embodiment, but may be implemented by combining each feature or omitting a part as appropriate. . In other words, those skilled in the art can make various changes and modifications to the content of the present disclosure based on the present disclosure. Accordingly, these variations and modifications are included within the scope of this disclosure. For example, in each embodiment, each functional unit, each means, each step, etc. may be added to other embodiments so as not to be logically contradictory, or each functional unit, each means, each step, etc. of other embodiments may be added to other embodiments to avoid logical contradiction. It is possible to replace it with Furthermore, in each embodiment, it is possible to combine or divide a plurality of functional units, means, steps, etc. into one. Furthermore, the embodiments of the present disclosure described above are not limited to being implemented faithfully to each of the described embodiments, but may be implemented by combining features or omitting some features as appropriate. You can also do that.

In the embodiment described above, the audible sound data input from the input unit 11 of the output device 2 and the audible sound data converted by the converting unit 12 are described as digital sound data. However, the audible sound data input from the input unit 11 of the output device 2 and the audible sound data converted by the converting unit 12 may be analog sound data. In this case, the input section 11 may include a microphone and the like. That is, the user may input audible sound data, which is analog data, from the microphone of the input unit 11 . The extraction unit 19 may obtain audible sound data as digital data by sampling audible sound data as analog data at a preset sampling rate.

In the embodiment described above, the sound collector 3 and the processing device 4 are described as being separate devices, as shown in FIG. 4. Further, as shown in FIG. 10, the sound collector 3 and the processing device 104 have been described as being separate devices. However, the sound collector 3 and the processing device 4 may be configured as an integrated device. Further, the sound collector 3 and the processing device 104 may be configured as an integrated device. An example of configuring the sound collector 3 and the processing device 4 as an integrated device will be described with reference to FIG. 13.

The sound collector 103 as shown in FIG. 13 may be an earphone. The sound collector 103 is a combination of the sound collector 3 and the processing device 4. The sound collector 103 can also be called a processing device. The sound collector 103 includes a microphone 30, a speaker 31, a storage section 33, a filter section 47, and a control section 34. The control section 34 includes an acquisition section 35 , a reproduction section 36 , a section detection section 51 , a section buffer 52 , and a pattern detection section 53 .

In FIG. 13, when the section detecting section 51 detects a sound section, it outputs information on the sound section to the section buffer 52 and the pattern detecting section 53. The section buffer 52 extracts audible sound data included in the sound section from among the audible sound data obtained from the filter section 47 based on the information on the sound section obtained from the section detection section 51. The section buffer 52 holds audible sound data of the extracted sound section.

In FIG. 13, upon acquiring sound section information from the section detecting section 51, the pattern detection section 53 determines whether inaudible sound data in the extracted sound section satisfies preset conditions. When the pattern detection unit 53 determines that the condition is satisfied, it outputs a reproduction trigger to the reproduction unit 36.

In FIG. 13, when the reproduction unit 36 obtains a reproduction trigger from the pattern detection unit 53, it acquires the audible sound data of the latest sound interval from among the audible sound data of the sound intervals held in the interval buffer 52. The reproduction unit 36 causes the speaker 31 to output the audible sound data of the acquired sound section.

Other configurations and effects of the sound collector 103 are the same or similar to the sound collector 3 and processing device 4 as shown in FIG. When the sound collector 3 and the processing device 104 are configured as an integrated device as shown in FIG. 52, a playback section 54, a buffer 55, a first extraction section 56, a second extraction section 57, a removal section 58, an emphasis section 59, a conversion section 60, and a recognition section 61. .

In one embodiment, (1) the processing device includes:
A control unit is provided that detects a sound in the audible range based on the sound component in the inaudible range when external sound data including information on a sound in the audible range is included in a sound component in the inaudible range.

(2) In the processing device described in (1) above,
The control unit may detect a sound section in which the sound continues in the audible range based on the component of the sound in the inaudible range.

(3) In the processing device described in (1) or (2) above,
The control unit may hold data of sounds in an audible range in the sound section.

(4) In the processing device according to any one of (1) to (3) above,
If the component of the sound in the inaudible range satisfies a preset condition, the control unit may notify the user that a sound satisfying the condition has been detected.

(5) In the processing device according to (4) above,
It further includes an input section,
When the input unit receives a playback instruction after notifying that a sound satisfying the condition has been detected, the control unit executes a playback process to play back sound data in the audible range of the sound interval. good.

(6) In the processing device according to any one of (1) to (5) above,
The control unit may execute a reproduction process of reproducing the sound in the audible range when the component of the sound in the inaudible range satisfies a preset condition.

(7) In the processing device according to any one of (1) to (6) above,
The information on the sound in the audible range may be information on the fundamental frequency of the sound in the audible range.

(8) In the processing device according to any one of (4) to (7) above,
The condition is that at least a part of the frequency spectrum signal of the inaudible sound matches a set pattern,
The setting pattern may be generated based on preset audible sound data.

(9) In the processing device according to (2) or (3) above,
The information on the sound in the audible range is information on the fundamental frequency of the sound in the audible range,
The control unit includes:
extracting information on the fundamental frequency from data of the inaudible range sound in the sound interval;
Among the sounds in the audible range of the sound section, the power of sounds with frequencies that are integral multiples of the fundamental frequency may be increased.

(10) In the processing device according to (9) above,
The control unit includes:
extracting noise component information based on data of the audible range sound included in an interval other than the sound interval;
The noise component may be removed from the sound data in the audible range of the sound section.

(11) In the processing device according to (9) or (10) above,
The control unit includes:
Obtaining text data of the sound in the audible range of the sound interval by performing speech recognition processing on the data of the sound in the audible range of the sound interval,
If the acquired text data includes a preset keyword, a reproduction process may be performed to reproduce sound data in the audible range of the sound section.

(12) In the processing device according to any one of (9) to (11) above,
The control unit includes:
Obtaining text data of the sound in the audible range of the sound interval by performing speech recognition processing on the data of the sound in the audible range of the sound interval,
If the acquired text data includes a preset keyword, the user may be notified that a sound including the keyword has been detected.

(13) In the processing device according to any one of (1) to (12) above,
Further comprising a filter section,
Upon acquiring the external sound data, the control unit may divide the external sound into data of the inaudible range sound and data of the audible range sound using the filter unit.

In one embodiment, (14) the output device includes:
speaker and
When the data of the sound in the audible range is received, a sound in the inaudible range that includes the information on the sound in the audible range is generated, and a sound in which the sound in the audible range and the sound in the inaudible range are superimposed is generated. a control unit that outputs output through a speaker;
Equipped with

In one embodiment, (15) the processing system includes:
When receiving sound data in the audible range, generates a sound in the inaudible range that includes information on the sound in the audible range as a component, and outputs a sound in which the sound in the audible range and the sound in the inaudible range are superimposed. an output device for
a processing device that detects a sound in the audible range based on a component of the sound in the inaudible range when external sound data is acquired;
including.

In this disclosure, descriptions such as "first" and "second" are identifiers for distinguishing the configurations. For configurations that are distinguished by descriptions such as “first” and “second” in the present disclosure, the numbers in the configurations can be exchanged. For example, the first extraction unit 56 can exchange the identifiers “first” and “second” with the second extraction unit 57. The exchange of identifiers takes place simultaneously. Even after exchanging identifiers, the configurations are distinguished. Identifiers may be removed. Configurations with removed identifiers are distinguished by codes. The description of identifiers such as "first" and "second" in this disclosure should not be used to interpret the order of the configuration or to determine the existence of lower-numbered identifiers.

1 processing system 2 output device 3,103 sound collector 4,104 processing device 10 input section 11 input section 12 conversion section 13 switch 14 delay buffer 15 superposition section 16 speaker 17 storage section 18 control section 19 extraction section 20 generation section 30 microphone 31 Speaker 32 Communication section 33 Storage section 34 Control section 35 Acquisition section 36 Reproduction section 37 Storage section 40 Communication section 41 Input section 42 Notification section 43 Display section 44 Vibration section 45 Light emitting section 46 Storage section 47

Filter section

50, 150 Control section 51 Section detection section 52 Section buffer 53 Pattern detection section 54 Reproduction section 55 Buffer 56 First extraction section 57 Second extraction section 58 Removal section 59 Emphasis section 60 Conversion section 61 Recognition section

Claims

A processing device comprising a control unit that detects a sound in the audible range based on the sound component of the inaudible range when external sound data that includes sound information in the audible range is acquired as a sound component in the inaudible range.
The processing device according to claim 1, wherein the control unit detects a sound interval in which a sound continues in the audible range based on the component of the sound in the inaudible range.
The processing device according to claim 2, wherein the control unit holds data of sounds in the audible range of the sound section.
4. The processing device according to claim 3, wherein when the component of the sound in the inaudible range satisfies a preset condition, the control unit notifies the user that a sound satisfying the condition has been detected.
It further includes an input section,
The control unit, when receiving a playback instruction through the input unit after notifying that a sound satisfying the condition has been detected, executes a playback process of playing back sound data in an audible range of the sound interval. Item 4. Processing device according to item 4.
The processing device according to claim 1, wherein the control unit executes a reproduction process of reproducing the sound in the audible range when the component of the sound in the inaudible range satisfies a preset condition.
The processing device according to any one of claims 4 to 6, wherein the information on the sound in the audible range is information on the fundamental frequency of the sound in the audible range.
The condition is that at least a part of the frequency spectrum signal of the inaudible sound matches a set pattern,
The processing device according to claim 7, wherein the setting pattern is generated based on preset audible sound data.
The information on the sound in the audible range is information on the fundamental frequency of the sound in the audible range,
The control unit includes:
extracting information on the fundamental frequency from data of the inaudible range sound in the sound interval;
3. The processing device according to claim 2, wherein the power of a sound having a frequency that is an integral multiple of the fundamental frequency is increased among sounds in the audible range of the sound interval.
The control unit includes:
extracting noise component information based on data of the audible range sound included in an interval other than the sound interval;
The processing device according to claim 9, wherein the noise component is removed from the sound data in the audible range of the sound interval.
The control unit includes:
Obtaining text data of the sound in the audible range of the sound interval by performing speech recognition processing on the data of the sound in the audible range of the sound interval,
11. The processing device according to claim 9, wherein when the acquired text data includes a preset keyword, the processing device executes a reproduction process of reproducing sound data in an audible range of the sound section.
The control unit includes:
Obtaining text data of the sound in the audible range of the sound interval by performing speech recognition processing on the data of the sound in the audible range of the sound interval,
The processing device according to claim 9 or 10, wherein when the acquired text data includes a preset keyword, the processing device notifies the user that a sound including the keyword has been detected.
Further comprising a filter section,
10. The controller according to claim 1, wherein upon acquiring the external sound data, the filter section divides the external sound into data of the inaudible range sound and data of the audible range sound. Processing equipment.
speaker and
When the data of the sound in the audible range is received, a sound in the inaudible range that includes the information on the sound in the audible range is generated, and a sound in which the sound in the audible range and the sound in the inaudible range are superimposed is generated. a control unit that outputs output through a speaker;
An output device comprising:
When receiving sound data in the audible range, generates a sound in the inaudible range that includes information on the sound in the audible range as a component, and outputs a sound in which the sound in the audible range and the sound in the inaudible range are superimposed. an output device for
a processing device that detects a sound in the audible range based on a component of the sound in the inaudible range when external sound data is acquired;
processing systems, including;