WO2024058147A1 - Processing device, output device, and processing system - Google Patents

Processing device, output device, and processing system Download PDF

Info

Publication number
WO2024058147A1
WO2024058147A1 PCT/JP2023/033103 JP2023033103W WO2024058147A1 WO 2024058147 A1 WO2024058147 A1 WO 2024058147A1 JP 2023033103 W JP2023033103 W JP 2023033103W WO 2024058147 A1 WO2024058147 A1 WO 2024058147A1
Authority
WO
WIPO (PCT)
Prior art keywords
sound
section
data
unit
audible
Prior art date
Application number
PCT/JP2023/033103
Other languages
French (fr)
Japanese (ja)
Inventor
利知 金岡
Original Assignee
京セラ株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 京セラ株式会社 filed Critical 京セラ株式会社
Publication of WO2024058147A1 publication Critical patent/WO2024058147A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/04Segmentation; Word boundary detection
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • G10L25/84Detection of presence or absence of voice signals for discriminating voice from noise

Definitions

  • the present disclosure relates to a processing device, an output device, and a processing system.
  • a portable music playback device that includes a notification unit that notifies you of the match from headphones when an external sound matches a predetermined phrase (Patent Document 1).
  • a processing device includes: A control unit is provided that detects a sound in the audible range based on the sound component in the inaudible range when external sound data including information on a sound in the audible range is included in a sound component in the inaudible range.
  • An output device includes: A speaker and a control unit that, when receiving data of an audible sound, generates an inaudible sound containing information of the audible sound as a component, and outputs a sound in which the audible sound and the inaudible sound are superimposed on each other through the speaker; Equipped with.
  • a processing system includes: When receiving sound data in the audible range, generates a sound in the inaudible range that includes information on the sound in the audible range as a component, and outputs a sound in which the sound in the audible range and the sound in the inaudible range are superimposed.
  • an output device for a processing device that detects a sound in the audible range based on a component of the sound in the inaudible range when external sound data is acquired; including.
  • FIG. 1 is a diagram showing a schematic configuration of a processing system according to an embodiment of the present disclosure.
  • FIG. 2 is a block diagram of the output device shown in FIG. 1.
  • FIG. FIG. 3 is a diagram for explaining frame division.
  • FIG. 2 is a block diagram of the sound collector and processing device shown in FIG. 1.
  • FIG. 1 is a graph of a frequency spectrum signal.
  • 3 is a flowchart showing the flow of sound output processing executed by the output device shown in FIG. 2.
  • FIG. 5 is a flowchart showing the flow of sound collection processing executed by the sound collector shown in FIG. 4.
  • FIG. 5 is a flowchart showing the flow of sound acquisition processing executed by the processing device shown in FIG. 4.
  • FIG. 5 is a flowchart showing the flow of sound acquisition processing executed by the processing device shown in FIG. 4.
  • FIG. FIG. 3 is a block diagram of a processing device according to another embodiment.
  • FIG. 3 is a diagram for explaining processing for emphasizing overtones. It is a flowchart which shows the flow of sound acquisition processing performed by the processing device concerning other embodiments.
  • 5 is a block diagram of a modification of the sound collector (processing device) shown in FIG. 4.
  • FIG. 4 is a block diagram of a modification of the sound collector (processing device) shown in FIG. 4.
  • a user may use a processing device, such as headphones or earphones, in noisy conditions. In this case, it may become difficult for the processing device to distinguish between the noise and the voice to be detected.
  • a technique for detecting sound with high accuracy can be provided.
  • the processing system 1 includes an output device 2, a sound collector 3, and a processing device 4.
  • the sound collector 3 and the processing device 4 are configured as separate devices.
  • the sound collector 3 and the processing device 4 may be configured as an integrated device as shown in FIG. 13, which will be described later.
  • the output device 2 is used, for example, in an airport waiting room or a station premises.
  • the output device 2 may be part of a public address system installed in a building.
  • the output device 2 outputs audible sound.
  • the output device 2 outputs an audible announcement sound for notifying the estimated arrival time of an airplane or train.
  • Audible sounds are sounds that can be heard by the average human ear.
  • the frequency band of the audible sound, that is, the audible range is, for example, a band of 20 [Hz] to 18 [kHz].
  • the output device 2 outputs inaudible sounds along with audible sounds.
  • Inaudible sounds are sounds that cannot be heard by the average human ear.
  • the frequency band of the inaudible sound is, for example, a band of 20 [Hz] or less, a band of 18 [kHz] to 22 [kHz], or a band of 22 [kHz] or more.
  • the output device 2 outputs inaudible sound in a band of 18 [kHz] to 22 [kHz].
  • the inaudible sound component output by the output device 2 includes information about the audible sound output by the output device 2. Since the inaudible sound component includes audible sound information, the processing device 4 can accurately detect the audible sound output by the output device 2 even under noisy conditions, as will be described later.
  • the sound collector 3 is, for example, an earphone. However, the sound collector 3 is not limited to earphones.
  • the sound collector 3 may be headphones or the like.
  • the sound collector 3 is worn by the user.
  • the sound collector 3 can output music and the like to the user.
  • the sound collector 3 may include an earphone section that is attached to the user's left ear, and an earphone section that is attached to the user's right ear.
  • the sound collector 3 collects external sounds around the sound collector 3. External sound is sound emitted outside the sound collector 3.
  • the sound collector 3 collects external sounds around the user by being worn by the user. That is, external sounds include sounds emitted around the user. External sounds may include sounds made by the user himself.
  • the sound collector 3 outputs the collected external sounds around the user to the user under the control of the processing device 4 . With this configuration, the user can hear external sounds around him while wearing the sound collector 3.
  • the processing device 4 is, for example, a smartphone, a mobile phone, a tablet, or a personal computer (PC). However, the processing device 4 is not limited to this.
  • the processing device 4 is operated by a user.
  • the user can operate the processing device 4 to make settings for the sound collector 3 and the like.
  • the processing device 4 acquires data on external sounds collected by the sound collector 3.
  • the external sound collected by the sound collector 3 may include noise in addition to the audible sound and inaudible sound output by the output device 2.
  • inaudible sounds are rare in nature compared to audible sounds. Therefore, most of the noise is audible. Therefore, the inaudible sound output by the output device 2 is not easily affected by noise.
  • the inaudible sound component outputted by the output device 2 includes information about the audible sound outputted by the output device 2. Therefore, by analyzing the inaudible sounds of the external sounds collected by the sound collector 3, the processing device 4 can accurately convert the audible sounds output by the output device 2 even if the external sounds include noise. It can be detected well.
  • the output device 2 includes an input section 10, an input section 11, a conversion section 12, a switch 13, a delay buffer 14, a superimposition section 15, a speaker 16, a storage section 17, A control unit 18 is provided.
  • digital sound data refers to data obtained by sampling analog sound data at a preset sampling rate.
  • Sound analog data refers to sound data collected by a microphone or the like.
  • the input unit 10 can receive text data input from the user.
  • the input unit 10 includes at least one input interface that can accept input of text data.
  • the input interface includes, for example, a keyboard.
  • the input unit 10 may receive text data input from another device.
  • the input unit 10 may be configured to include at least one connection interface connectable to other devices.
  • the connection interface is an interface compatible with standards such as USB (Universal Serial Bus), HDMI (registered trademark) (High-Definition Multimedia Interface), or Bluetooth (registered trademark).
  • the input unit 11 can receive input of audible sound data from other devices. It is assumed that the audible sound data inputted from the input unit 11 is audible sound digital data.
  • the input unit 11 includes at least one connection interface connectable to other devices. The connection interface may be the same as or similar to the input section 10.
  • the conversion unit 12 acquires text data from the input unit 10 under the control of the control unit 18.
  • the converter 12 converts text data into audible sound data under the control of the controller 18 .
  • the conversion unit 12 converts text data into audible sound data by text-to-speech synthesis. Speech data used for text-to-speech synthesis may be stored in the storage unit 17. It is assumed that the audible sound data after conversion is digital sound data.
  • the switch 13 is connected between the conversion section 12, the input section 11, the delay buffer 14, and the control section 18.
  • the switch 13 switches the electrical connection relationship between the input section 11 , the conversion section 12 , the delay buffer 14 , and the control section 18 based on the control of the control section 18 .
  • the switch 13 includes, for example, an arbitrary switching element such as a transistor.
  • the delay buffer 14 is a temporary storage memory.
  • the delay buffer 14 acquires audible sound data from the switch 13 under the control of the control unit 18 .
  • the delay buffer 14 holds the acquired audible sound data for a predetermined period of time.
  • the predetermined time is, for example, the time from when the extraction unit 19 (described later) acquires audible sound data from the switch 13 until when the generation unit 20 (described below) outputs inaudible sound data to the superimposition unit 15.
  • the delay buffer 14 After holding the audible sound data for a predetermined time, the delay buffer 14 outputs the audible sound data to the superimposing section 15 .
  • the delay buffer 14 is configured to include the same or similar components as the storage unit 17, which will be described later.
  • the delay buffer 14 may be part of the storage unit 17.
  • the superimposition unit 15 acquires audible sound data from the delay buffer 14 under the control of the control unit 18.
  • the superimposition unit 15 acquires inaudible sound data from the control unit 18 under the control of the control unit 18 .
  • the superimposing section 15 superimposes the audible sound data from the delay buffer 14 and the inaudible sound data from the control section 18 under the control of the control section 18 .
  • the superimposing unit 15 outputs to the speaker 16 sound data in which audible sound data and inaudible sound data are superimposed.
  • the speaker 16 is capable of outputting sound.
  • the speaker 16 is, for example, a loudspeaker that can convert electrical signals into sound.
  • the speaker 16 acquires sound data, which is an electrical signal, from the superimposing section 15 under the control of the control section 18 . Under the control of the control unit 18, the speaker 16 converts the acquired sound data into sound and outputs the sound.
  • the storage unit 17 is configured to include at least one semiconductor memory, at least one magnetic memory, at least one optical memory, or a combination of at least two of these.
  • the semiconductor memory is, for example, RAM (Random Access Memory) or ROM (Read Only Memory).
  • the RAM is, for example, SRAM (Static Random Access Memory) or DRAM (Dynamic Random Access Memory).
  • the ROM is, for example, an EEPROM (Electrically Erasable Programmable Read Only Memory).
  • the storage unit 17 may function as a main storage device, an auxiliary storage device, or a cache memory.
  • the storage unit 17 stores data used for the operation of the output device 2 and data obtained by the operation of the output device 2.
  • the storage unit 17 stores audio data used by the conversion unit 12 for text-to-speech synthesis.
  • the control unit 18 is configured to include at least one processor, at least one dedicated circuit, or a combination thereof.
  • the processor is a general-purpose processor such as a CPU (Central Processing Unit) or a GPU (Graphics Processing Unit), or a dedicated processor specialized for specific processing.
  • the dedicated circuit is, for example, an FPGA (Field-Programmable Gate Array) or an ASIC (Application Specific Integrated Circuit).
  • the control unit 18 executes processing related to the operation of the output device 2 while controlling each part of the output device 2 .
  • the control unit 18 acquires audible sound data from the input unit 11 or the conversion unit 12 via the switch 13. Upon acquiring the audible sound data, the control unit 18 generates inaudible sound data that includes audible sound information as a sound component. In order to execute this process, in this embodiment, the control unit 18 includes an extraction unit 19 and a generation unit 20. Details of the processing by the extraction unit 19 and the generation unit 20 will be described later.
  • the user inputs text data of an announcement sound that he wants to output from the output device 2 from the input unit 10.
  • the user inputs data of an audible sound that is an announcement sound that the user wants to output from the output device 2 from the input unit 11 .
  • the control unit 18 When the text data of the announcement sound is input from the input unit 10, the control unit 18 receives the input of the text data from the input unit 10. The control unit 18 causes the conversion unit 12 to convert the received text data into audible sound data. The control unit 18 electrically connects the conversion unit 12, the delay buffer 14, and the extraction unit 19 using the switch 13. With such a configuration, audible sound data converted by the converter 12 is output from the converter 12 to the delay buffer 14 and the extractor 19 via the switch 13.
  • the control unit 18 When audible sound data, which is an announcement sound, is input from the input unit 11, the control unit 18 receives the input of the audible sound data from the input unit 11. When the control section 18 receives input of audible sound data, the control section 18 electrically connects the input section 11 with the delay buffer 14 and the extraction section 19 using the switch 13 . With this configuration, audible sound data received from the input section 11 is outputted from the input section 11 to the delay buffer 14 and the extraction section 19 via the switch 13.
  • the extraction unit 19 acquires audible sound data from the input unit 11 or the conversion unit 12 via the switch 13.
  • the extraction unit 19 extracts audible sound information to be included in the inaudible sound component output from the output device 2 from the acquired audible sound data.
  • the extraction unit 19 extracts the fundamental frequency of the audible sound as the audible sound information to be included in the inaudible sound component.
  • the extraction unit 19 extracts the fundamental frequency from the audible sound data by short-time Fourier transform (STFT).
  • STFT short-time Fourier transform
  • the extraction unit 19 may extract the fundamental frequency from the audible sound data using any method. This processing by the extraction unit 19 will be explained below.
  • the extraction unit 19 divides the audible sound data into frames having a predetermined frame length. For example, the extraction unit 19 divides the audible sound data such that one frame contains sampling data of hundreds to thousands of sounds. The extraction unit 19 divides the audible sound data into frames at intervals of 1/2 to 1/4 of the frame length. This will be explained below with reference to FIG.
  • the left side of Figure 3 shows a graph of audible sound data.
  • the horizontal axis of this graph is time.
  • the frame length is frame length L1.
  • the extraction unit 19 divides the audible sound data into frames every 1 ⁇ 3 of the frame length L1.
  • the extraction unit 19 obtains frames Fr1, Fr2, and Fr3 as shown on the right side of FIG. 3 by dividing the audible sound data into frames every 1 ⁇ 3 of the frame length L1.
  • the extraction unit 19 extracts the fundamental frequency from the audible sound sampling data included in the frame for each frame.
  • the extraction unit 19 may extract the fundamental frequency based on the period of peaking the autocorrelation function of the audible sound sampling data included in the frame.
  • the extraction unit 19 may extract the fundamental frequency using the cepstral method.
  • the extraction unit 19 outputs fundamental frequency data extracted for each frame to the generation unit 20.
  • the generation unit 20 acquires fundamental frequency data for each frame from the extraction unit 19.
  • the generation unit 20 generates inaudible sound data that includes a fundamental frequency, which is audible sound information, as a sound component.
  • the generation unit 20 generates a sine wave for each frame based on fundamental frequency data.
  • This sine wave is inaudible sound data.
  • inaudible sound data to be output from the output device 2 is generated.
  • the generation unit 20 generates a sine wave x(t) at time t using equation (1).
  • a sine wave x(t) having the fundamental frequency f 0 as a component can be generated.
  • x(t) A sin ⁇ 2 ⁇ t ⁇ (f 0 +X) ⁇
  • the amplitude A may be set based on the intensity of audible sound data.
  • the constant X may be set based on the inaudible range used in the processing system 1.
  • the constant X is, for example, 18 [kHz].
  • the generation unit 20 After generating a sine wave for each frame, the generation unit 20 synthesizes the sine waves for each frame to generate inaudible sound data.
  • the generation unit 20 may multiply the sine wave at the end of the frame by a window function, and then synthesize the sine waves for each frame.
  • the window function is, for example, a Hamming window function.
  • the generating unit 20 After generating the inaudible sound data, the generating unit 20 outputs the generated inaudible sound data to the superimposing unit 15.
  • the control unit 18 causes the superimposition unit 15 to superimpose the audible sound data from the delay buffer 14 and the inaudible sound data from the generation unit 20 to generate sound data.
  • the control unit 18 causes the superimposition unit 15 to output the generated sound data to the speaker 16.
  • the control unit 18 causes the speaker 16 to convert the sound data from the superimposing unit 15 into sound and output it.
  • the sound collector 3 includes a microphone 30, a speaker 31, a communication section 32, a storage section 33, and a control section 34.
  • the main flow of data is shown by solid lines.
  • the microphone 30 is capable of collecting external sounds around the sound collector 3.
  • the microphone 30 includes a left microphone and a right microphone.
  • the left microphone may be included in an earphone unit included in the sound collector 3 that is attached to the user's left ear.
  • the right microphone may be included in the earphone section included in the sound collector 3 and worn on the right side of the user.
  • the microphone 30 is a stereo microphone or the like.
  • the speaker 31 is capable of outputting sound.
  • the speaker 31 includes a left speaker and a right speaker.
  • the left speaker may be included in an earphone unit included in the sound collector 3 that is attached to the user's left ear.
  • the right speaker may be included in the earphone section included in the sound collector 3 and worn on the right side of the user.
  • the speaker 31 is a stereo speaker or the like.
  • the communication unit 32 includes at least one communication module that can communicate with the processing device 4 via a communication line.
  • the communication module is a communication module compatible with communication line standards.
  • the standard of the communication line is, for example, a wired communication standard or a short-range wireless communication standard including Bluetooth (registered trademark), infrared rays, NFC (Near Field Communication), and the like.
  • the storage unit 33 is configured to include at least one semiconductor memory, at least one magnetic memory, at least one optical memory, or a combination of at least two of these.
  • the semiconductor memory is, for example, RAM or ROM.
  • the RAM is, for example, SRAM or DRAM.
  • the ROM is, for example, an EEPROM.
  • the storage unit 33 may function as a main storage device, an auxiliary storage device, or a cache memory.
  • the storage unit 33 stores data used for the operation of the sound collector 3 and data obtained by the operation of the sound collector 3.
  • the storage unit 33 stores system programs, application programs, embedded software, and the like.
  • the control unit 34 is configured to include at least one processor, at least one dedicated circuit, or a combination thereof.
  • the processor is a general-purpose processor such as a CPU or GPU, or a dedicated processor specialized for specific processing.
  • the dedicated circuit is, for example, an FPGA or an ASIC.
  • the control unit 34 executes processing related to the operation of the sound collector 3 while controlling each part of the sound collector 3.
  • the control unit 34 includes an acquisition unit 35, a playback unit 36, and a storage unit 37.
  • the storage section 37 is configured to include the same or similar components as the storage section 33. At least a portion of the storage section 37 may be a portion of the storage section 33. The operation of the storage section 37 is executed by the processor of the control section 34 or the like.
  • the acquisition unit 35 acquires external sound digital data from the external sound analog data collected by the microphone 30. For example, the acquisition unit 35 acquires digital data of external sound by sampling analog data of external sound at a preset sampling rate.
  • the acquisition unit 35 outputs the digital data of the external sound to the reproduction unit 36. Further, the acquisition unit 35 transmits the digital data of the external sound to the processing device 4 through the communication unit 32 .
  • the acquisition unit 35 may acquire digital data of the left external sound from analog data of the external sound collected by the left microphone. Further, the acquisition unit 35 may acquire digital data of the right external sound from analog data of the external sound collected by the right microphone. The acquisition unit 35 may output the left external sound digital data and the right external sound digital data to the playback unit 36 . The acquisition unit 35 may transmit the left external sound digital data and the right external sound digital data to the processing device 4 through the communication unit 32.
  • the left external sound digital data and the right external sound digital data are not particularly distinguished, they will also be simply referred to as "external sound digital data.”
  • the reproduction unit 36 acquires digital data of external sound from the acquisition unit 35.
  • the playback unit 36 receives the replay flag from the processing device 4 through the communication unit 32 .
  • the replay flag is set to True or False by the processing device 4, which will be described later.
  • the through mode is a mode in which the external sound collected by the sound collector 3 is output from the sound collector 3 to the user without going through the processing device 4 .
  • the sound collector 3 and the like operate in the through mode, the user can hear surrounding external sounds while wearing the sound collector 3.
  • the reproduction mode is a mode in which reproduction data acquired by the sound collector 3 from the processing device 4 is output from the speaker 31.
  • the reproduction unit 36 causes the speaker 31 to output the digital data of the external sound acquired from the acquisition unit 35.
  • the reproduction unit 36 acquires the left and right external sound digital data from the acquisition unit 35, the reproduction unit 36 outputs the left external sound digital data to the left speaker of the speaker 31, and outputs the right external sound digital data. It may be output to the right speaker of the speaker 31.
  • the playback unit 36 causes the speaker 31 to output the playback data accumulated in the storage unit 37.
  • the storage unit 37 receives playback data from the processing device 4 through the communication unit 32.
  • the storage unit 37 stores the received reproduction data in the storage unit 37 .
  • the storage unit 37 may receive the left playback data and the right playback data from the processing device 4 and store them in the storage unit 37.
  • the playback unit 36 outputs the left playback data stored in the storage unit 37 to the left speaker of the speaker 31, and outputs the right playback data stored in the storage unit 37 to the right speaker of the speaker 31. You can output it.
  • the processing device 4 includes a communication section 40, an input section 41, a notification section 42, a storage section 46, a filter section 47, and a control section 50.
  • the communication unit 40 is configured to include at least one communication module that can communicate with the sound collector 3 via a communication line.
  • the communication module is a communication module compatible with communication line standards.
  • the standard of the communication line is, for example, a wired communication standard or a short-range wireless communication standard including Bluetooth (registered trademark), infrared rays, NFC, and the like.
  • the input unit 41 can accept input from the user.
  • the input unit 41 includes at least one input interface that can accept input from a user.
  • the input interface is, for example, a physical key, a capacitive key, a pointing device, a touch screen provided integrally with the display of the display unit 43, a microphone, or the like.
  • the notification unit 42 can notify the user of information.
  • the notification section 42 includes a display section 43, a vibration section 44, and a light emitting section 45.
  • the components included in the notification unit 42 are not limited to these.
  • the notification unit 42 may include any component that can notify the user of information.
  • the display unit 43 can display data.
  • the display unit 43 notifies the user of information according to the data by displaying the data.
  • the display unit 43 is, for example, a display.
  • the display is, for example, an LCD (Liquid Crystal Display) or an organic EL (Electro Luminescence) display.
  • the vibration unit 44 can vibrate the processing device 4.
  • the vibrating unit 44 notifies the user of information according to the vibration mode by vibrating the processing device 4.
  • the vibrating section 44 includes, for example, a vibrating element such as a piezoelectric element.
  • the light emitting section 45 is capable of emitting light.
  • the light emitting unit 45 notifies the user of information according to the light emission mode by emitting light.
  • the light emitting unit 45 includes, for example, an LED (Light Emitting Diode).
  • the storage unit 46 is configured to include at least one semiconductor memory, at least one magnetic memory, at least one optical memory, or a combination of at least two of these.
  • the semiconductor memory is, for example, RAM or ROM.
  • the RAM is, for example, SRAM or DRAM.
  • the ROM is, for example, an EEPROM.
  • the storage unit 46 may function as a main storage device, an auxiliary storage device, or a cache memory.
  • the storage unit 46 stores data used for the operation of the processing device 4 and data obtained by the operation of the processing device 4.
  • the storage unit 46 stores system programs, application programs, embedded software, and the like.
  • the storage unit 46 stores a setting pattern described below.
  • the filter unit 47 is configured to include, for example, an audible bandpass filter through which only audible sound data can pass, and an inaudible bandpass filter through which only inaudible sound data can pass. .
  • a bandpass filter for the audible range can pass only sound data of, for example, 20 [Hz] to 10 [kHz].
  • the bandpass filter for the inaudible range can pass sound data of, for example, 18 [kHz] to 22 [kHz].
  • the control unit 50 divides the received external sound digital data into inaudible sound data and audible sound data using the filter unit 47. Separate.
  • the filter section 47 outputs the inaudible sound data to the section detection section 51 and the pattern detection section 53.
  • the filter section 47 outputs audible sound data to the section buffer 52.
  • the filter unit 47 converts the digital data of the external sound for the left into the digital data of the external sound for the left based on the control of the control unit 50.
  • the data can be divided into inaudible sound data for the left side and audible sound data for the left side.
  • the filter unit 47 may divide the digital data of the external sound for the right into inaudible sound data for the right and audible sound data for the right based on the control of the control unit 50.
  • the filter section 47 may output the data of the left and right inaudible sounds to the section detection section 51 and the pattern detection section 53.
  • the filter unit 47 may output left and right audible sound data to the section buffer 52 .
  • left audible sound data and right audible sound data are not particularly distinguished, they will also be simply referred to as “audible sound data.” Further, when the left inaudible sound data and the right inaudible sound data are not particularly distinguished, they are also simply described as “inaudible sound data.”
  • the control unit 50 is configured to include at least one processor, at least one dedicated circuit, or a combination thereof.
  • the processor is a general-purpose processor such as a CPU or GPU, or a dedicated processor specialized for specific processing.
  • the dedicated circuit is, for example, an FPGA or an ASIC.
  • the control unit 50 executes processing related to the operation of the processing device 4 while controlling each part of the processing device 4 .
  • the control section 50 includes a section detection section 51, a section buffer 52, a pattern detection section 53, and a reproduction section 54.
  • the section buffer 52 may be part of the storage section 46. The operation of the section buffer 52 is executed by the processor of the control unit 50 or the like.
  • the section detection section 51 acquires inaudible sound data from the filter section 47.
  • the section detection unit 51 detects a sound section in which a sound continues in the audible sound data based on the inaudible sound data.
  • the output device 2 generates inaudible sound data that includes audible sound information as a component. Therefore, when audible sound data includes a sound interval, inaudible data corresponding to the sound interval includes sound data other than noise. Therefore, in the present embodiment, the section detection unit 51 detects a sound section by determining whether or not the inaudible sound data includes sound data other than noise. An example of the processing of the section detection unit 51 will be described below.
  • the section detection unit 51 divides the inaudible sound data into frames having a predetermined frame length. For example, the section detection unit 51 divides the inaudible sound data such that one frame includes sampling data of hundreds to thousands of sounds. The section detection unit 51 divides the inaudible sound data into frames at intervals of 1/2 to 1/4 of the frame length in the same or similar manner to the process described above with reference to FIG.
  • the section detection unit 51 divides the inaudible sound data into frames, it determines for each frame whether or not the frame includes sound data other than noise by the following process.
  • the section detection unit 51 performs Fast Fourier Transform (FFT) on the sound data included in each frame, and obtains a frequency spectrum signal for each frame.
  • This frequency spectrum signal becomes a frequency spectrum signal sg1 as shown in FIG. 5, which will be described later.
  • This frequency spectrum signal corresponds to the spectrum signal of the fundamental frequency component extracted from the audible sound data by the extraction unit 19 of the output device 2.
  • the section detection unit 51 calculates the power of the frequency spectrum signal for each frame by squaring the frequency spectrum signal, for example.
  • the section detection unit 51 determines whether the power of the frequency spectrum signal for each frame is greater than or equal to the power threshold.
  • the power threshold may be set in advance as a fixed value.
  • the section detection unit 51 may set a power threshold for each frame.
  • the section detection unit 51 may calculate the power threshold for each frame based on the statistical estimation of the noise power for each frame.
  • the statistical estimation is, for example, an average value or a variance value.
  • the section detection unit 51 determines that the power of the frequency spectrum signal is equal to or greater than the power threshold, it determines that the frame corresponding to the frequency spectrum signal includes sound data other than noise. When determining that the power of the frequency spectrum signal is less than the power threshold, the section detection unit 51 determines that the frame corresponding to the frequency spectrum signal does not include sound data other than noise.
  • the section detection unit 51 detects, as a sound section, a section in which a frame determined to include sound data other than noise continues.
  • the section detection unit 51 may detect a sound section by executing hangover processing.
  • the interval detection unit 51 may detect the sound interval using other methods.
  • the section detection unit 51 may detect a sound section by separating clusters of noise from clusters of sound data other than noise using a Gaussian Mixture Model (GMM).
  • GMM Gaussian Mixture Model
  • the section detection unit 51 may acquire left inaudible sound data and right inaudible sound data from the filter unit 47.
  • the section detection unit 51 may detect a sound section in which a sound continues in the audible sound data based on both the left and right inaudible sound data. For example, in the same manner as or similar to the above, the section detection unit 51 divides the left inaudible sound data into frames to obtain left frames, and obtains a frequency spectrum signal for each left frame. Furthermore, the section detection unit 51 divides the right inaudible sound data into frames to obtain right frames, and obtains a frequency spectrum signal for each right frame.
  • the section detection unit 51 determines whether the power of the frequency spectrum signal for each frame for the left and right is equal to or greater than the power threshold.
  • a frame in which the power of the frequency spectrum signal exceeds the power threshold is described as a "True frame.”
  • the section detection unit 51 determines whether one of the left and right frames is a True frame.
  • the inaudible sound frames corresponding to the left and right frames include sound data other than noise. It is determined that
  • the inaudible sound frames corresponding to the left and right inaudible sounds are obtained by considering the left inaudible sound frame and the right inaudible sound frame as one frame.
  • the section detection unit 51 determines whether both the left frame and the right frame are True frames. In this case, if the section detection unit 51 determines that both the left and right frames are True frames, the inaudible sound frames corresponding to the left and right frames include sound data other than noise. It is determined that the On the other hand, if the section detection unit 51 does not determine that both the left and right frames are True frames, the inaudible sound frames corresponding to the left and right frames include sound data other than noise. It is determined that there is no.
  • the section detection unit 51 When the section detection unit 51 detects a sound section, it generates a section ID.
  • the section ID is identification information that can uniquely identify a sound section.
  • the section detection section 51 outputs the information on the sound section and the section ID to the section buffer 52 and the pattern detection section 53.
  • the sound interval information includes information on the start point and end point of the sound interval. The start point and end point of the sound section are specified, for example, by time or the like.
  • the section buffer 52 acquires audible sound data from the filter section 47. Furthermore, the section buffer 52 acquires information on the sound section and the section ID from the section detection section 51. The section buffer 52 extracts audible sound data included in the sound section from among the audible sound data obtained from the filter section 47 based on the sound section information obtained from the section detection section 51. The section buffer 52 stores audible sound data of the extracted sound section in association with the section ID.
  • the section buffer 52 may acquire left and right audible sound data from the filter unit 47. In this case, the section buffer 52 may extract and hold audible sound data for the left and right sections of the sound section.
  • the section buffer 52 may delete the held audible sound data of the sound section.
  • the predetermined time may be set based on the amount of data that can be held in the section buffer 52.
  • the pattern detection unit 53 acquires inaudible sound data from the filter unit 47. Furthermore, the pattern detection unit 53 acquires information on the sound interval and the interval ID from the interval detection unit 51.
  • the pattern detection unit 53 acquires the sound interval information and the interval ID from the interval detection unit 51, it extracts the inaudible sound data of the sound interval from among the inaudible sound data acquired from the filter unit 47.
  • the pattern detection unit 53 determines whether the inaudible sound component of the extracted sound section satisfies preset conditions. In this embodiment, this condition is that at least a portion of the frequency spectrum signal of the inaudible sound data in the sound interval matches a setting pattern described below. The processing of the pattern detection section 53 will be explained below.
  • the pattern detection unit 53 divides the extracted inaudible sound data of the sound section into frames having a predetermined frame length. For example, the pattern detection unit 53 divides the inaudible sound data such that one frame contains sampling data of hundreds to thousands of sounds. The pattern detection unit 53 divides the inaudible sound data into frames at intervals of 1/2 to 1/4 of the frame length in the same or similar manner to the process described above with reference to FIG.
  • the pattern detection unit 53 performs fast Fourier transform on the digital sound data included in each frame, and obtains a frequency spectrum signal for each frame.
  • This frequency spectrum signal corresponds to the spectrum signal of the fundamental frequency component extracted from the audible sound data by the extraction unit 19 of the output device 2.
  • the pattern detection unit 53 determines whether the shape of at least a portion of the frequency spectrum signal of the inaudible sound data in the sound interval matches the set pattern.
  • the pattern detection unit 53 may determine whether the shape of at least a portion of the frequency spectrum signal of the inaudible sound data in the sound interval matches the set pattern by calculating a two-dimensional cross-correlation.
  • the setting pattern is generated based on preset audible sound data, for example.
  • the setting pattern is generated based on a preset fundamental frequency of an audible sound, the length of the audible sound, and the sound speed of the audible sound.
  • the setting pattern may be generated based on the shape of a spectral signal of a fundamental frequency component when preset audible sound data is converted into a spectral signal of fundamental frequency components. For example, assume that the output device 2 is used in a waiting room at an airport. Further, it is assumed that the user does not want to miss the information regarding "Flight 153.” In this case, the setting pattern is generated based on the audible sound data of "flight 153" outputted by the output device 2.
  • FIG. 5 shows a graph of the frequency spectrum signal sg1.
  • the frequency spectrum signal sg1 is obtained by the pattern detection unit 53 performing fast Fourier transform on inaudible sound data.
  • the horizontal axis indicates time.
  • the vertical axis indicates frequency.
  • the upper part of FIG. 5 shows digital data of audible sounds for reference.
  • the section from time t1 to time t2 is a sound section.
  • the pattern detection unit 53 determines that the frequency spectrum signal sg2 included in the dotted line portion of the frequency spectrum signal sg1 matches the set pattern.
  • the pattern detection unit 53 determines that the shape of at least a portion of the frequency spectrum signal of the inaudible sound in the sound interval matches the set pattern, the pattern detection unit 53 outputs a reproduction trigger and the interval ID to the reproduction unit 54.
  • the pattern detection unit 53 may acquire left inaudible sound data and right inaudible sound data from the filter unit 47. In this case, the pattern detection unit 53 may determine whether both components of the left inaudible sound and the right inaudible sound in the sound section satisfy a preset condition.
  • the playback unit 54 acquires the playback trigger and section ID from the pattern detection unit 53.
  • the playback unit 54 executes playback processing upon acquiring the playback trigger.
  • the playback unit 54 acquires audible sound data associated with the section ID from the section buffer 52.
  • the reproduction unit 54 sets the replay flag to True, and transmits the acquired audible sound data to the sound collector 3 as reproduction data through the communication unit 40. After transmitting all the reproduction data to the sound collector 3, the reproduction unit 54 sets the replay flag to False.
  • the playback unit 54 may transmit the left audible sound data to the sound collector 3 as left playback data. Furthermore, the reproduction unit 54 may transmit data of the audible sound for the right to the sound collector 3 as reproduction data for the right.
  • control unit 50 may cause the display unit 43 to display the text corresponding to the set pattern and the detection time at which the sound satisfying the condition was detected.
  • the text corresponding to the setting pattern is a verbalization of the audible sound used to generate the setting pattern. For example, if the text corresponding to the setting pattern is "Flight 512" and the detection time is 11:11, the control unit 50 causes the display unit 43 to display the information "Flight 512, detection time 11:11". .
  • control unit 50 may notify the user that a sound satisfying the condition has been detected by causing the vibration unit 44 to signal the processing device 4.
  • control unit 50 may notify the user that a sound satisfying the condition has been detected by causing the light emitting unit 45 to emit light.
  • the control unit 50 may receive the reproduction instruction through the input unit 41.
  • the reproduction instruction is when the text and detection time are displayed on the display section 43, and when the input section 41 is a touch screen provided integrally with the display of the display section 43, the reproduction instruction is given by displaying the text and the detection time on the display section 43. It may be a touch operation to the time.
  • the control unit 50 Upon receiving the playback instruction, the control unit 50 outputs the text for which the touch operation was received, the section ID corresponding to the detection time, and a playback trigger to the playback unit 54 .
  • the reproduction unit 54 executes the reproduction process as described above.
  • FIG. 6 is a flowchart showing the flow of sound output processing executed by the output device 2 shown in FIG. Hereinafter, it is assumed that the user inputs text data from the input unit 10.
  • the control unit 18 starts processing in step S1.
  • the control unit 18 receives input of text data through the input unit 10 (step S1).
  • the control unit 18 causes the conversion unit 12 to convert the text data received in the process of step S1 into audible sound data (step S2).
  • the control unit 18 electrically connects the conversion unit 12, the delay buffer 14, and the extraction unit 19 using the switch 13.
  • the extraction unit 19 Upon acquiring the audible sound data from the switch 13, the extraction unit 19 divides the acquired audible sound data into frames (step S3). The extraction unit 19 extracts the fundamental frequency from the audible sound sampling data included in the frame for each frame (step S4).
  • the generation unit 20 acquires the fundamental frequency data for each frame from the extraction unit 19, it generates a sine wave for each frame based on the fundamental frequency data (step S5). After generating the sine wave, the generation unit 20 synthesizes the sine waves for each frame to generate inaudible sound data (step S6).
  • the control unit 18 causes the superimposition unit 15 to superimpose the audible sound data from the delay buffer 14 and the inaudible sound data from the generation unit 20 to generate sound data (step S7).
  • the control unit 18 converts the sound data generated in the process of step S7 into sound and outputs the sound using the speaker 31 (step S8).
  • control unit 18 may receive input of audible sound data through the input unit 11.
  • control section 18 electrically connects the input section 11, the delay buffer 14, and the extraction section 19 using the switch 13.
  • FIG. 7 is a flowchart showing the flow of sound collection processing performed by the sound collector 3 shown in FIG.
  • the control unit 34 starts the process of step S11.
  • the control unit 34 determines whether the replay flag from the processing device 4 is True (step S11). When the control unit 34 determines that the replay flag is True (step S11: YES), the process proceeds to step S12. When the control unit 34 determines that the replay flag is False (step S11: NO), the process proceeds to step S13.
  • control unit 34 controls to operate in the playback mode.
  • the playback section 36 causes the speaker 31 to output the playback data accumulated in the storage section 37 .
  • control unit 34 controls to operate in through mode.
  • the reproduction unit 36 causes the speaker 31 to output the digital data of the external sound acquired from the acquisition unit 35.
  • step S12 or step S13 the control unit 34 returns to the process of step S11.
  • FIG. 8 and 9 are flowcharts showing the flow of the sound acquisition process executed by the processing device 4 shown in FIG. 4. For example, when the transmission of external sound digital data from the sound collector 3 to the processing device 4 is started, the control unit 50 starts the process of step S21.
  • the control unit 50 receives digital data of external sound from the sound collector 3 through the communication unit 40 (step S21).
  • the control unit 50 uses the filter unit 47 to separate the received external sound digital data into inaudible sound data and audible sound data (step S22).
  • the section detection unit 51 divides the inaudible sound data acquired from the filter unit 47 into frames (step S23).
  • the section detection unit 51 detects a sound section by determining whether the frame includes sound data other than noise (step S24).
  • the section buffer 52 extracts audible sound data included in the sound section from among the audible sound data obtained from the filter section 47 based on the sound section information obtained from the section detection section 51.
  • the section buffer 52 stores the audible sound data of the extracted sound section in association with the section ID (step S25).
  • the pattern detection unit 53 divides the inaudible sound data of the sound section detected in the process of step S24 into frames (step S26).
  • the pattern detection unit 53 performs fast Fourier transform on the digital sound data included in each frame, and obtains a frequency spectrum signal for each frame (step S27).
  • the pattern detection unit 53 determines whether the shape of at least a portion of the frequency spectrum signal of the inaudible sound data in the sound interval matches the set pattern (step S28).
  • step S28 determines that the shape of at least a portion of the frequency spectrum signal of the inaudible sound in the sound interval matches the set pattern (step S28: YES)
  • the processing device 4 proceeds to the process of step S29.
  • step S28: NO determines that the shape of at least part of the frequency spectrum signal of the inaudible sound in the sound interval matches the set pattern
  • step S29 upon acquiring the reproduction trigger and section ID from the pattern detection unit 53, the reproduction unit 54 sets the replay flag to True.
  • the reproduction unit 54 acquires audible sound data associated with the interval ID from the interval buffer 52, and transmits the acquired audible sound data as reproduction data to the sound collector 3 by the communication unit 40 (step S30).
  • the reproduction unit 54 determines whether all reproduction data has been transmitted to the sound collector 3 (step S31). When determining that all the reproduction data has been transmitted to the sound collector 3 (step S31: YES), the reproduction unit 54 sets the replay flag to False (step S32). If the reproduction unit 54 does not determine that all the reproduction data has been transmitted to the sound collector 3 (step S31: NO), the process returns to step S30.
  • control unit 50 uses the notification unit 42 to notify the user that a sound satisfying the preset conditions has been detected.
  • the control unit 50 determines whether a reproduction instruction is received by the input unit 41 (step S34). When the control unit 50 determines that the reproduction instruction has been received by the input unit 41 (step S34: YES), the process proceeds to step S35 as shown in FIG. On the other hand, if the control unit 50 does not determine that the reproduction instruction has been received by the input unit 41 (step S34: NO), it repeatedly executes the process of step S34.
  • the control unit 50 may end the sound acquisition process if a predetermined period of time has elapsed while repeatedly performing the process of step S34. The predetermined time may be set based on the specifications of the processing device 4.
  • the control unit 50 executes steps S35, S36, S37, and S38 as shown in FIG. 9 in the same manner as or similar to steps S29, S30, S31, and S32 as shown in FIG. After the process in step S38, the control unit 50 returns to the process in step S21.
  • the control unit 18 of the output device 2 receives text data of audible sounds such as announcement sounds output from the output device 2 from the input unit 10.
  • the control unit 18 receives data of an audible sound such as an announcement sound output from the output device 2 from the input unit 11 .
  • the control unit 18 generates an inaudible sound containing the received audible sound information as a component, and the speaker 16 outputs a sound in which the audible range sound and the inaudible range sound are superimposed. Inaudible sounds cannot be heard by the average person. However, some people can hear inaudible sounds and may find them unpleasant.
  • the control unit 50 of the processing device 4 acquires external sound data by having the communication unit 40 receive the external sound data via the sound collector 3 .
  • the external sound data may include announcement sound data output by the output device 2.
  • the control unit 50 detects an audible sound such as an announcement sound outputted by the output device 2 from among the external sounds based on the inaudible sound component of the external sound.
  • inaudible sounds are rare in nature compared to audible sounds. In other words, most of the noise is audible. Therefore, even if the external sound includes noise, the control unit 50 can accurately detect the audible sound output by the output device 2 based on the inaudible sound component of the external sound.
  • the control unit 50 of the processing device 4 may notify the user that a sound satisfying the condition has been detected. good. With such a configuration, the user can know that a sound satisfying the condition has been detected.
  • the control unit 50 of the processing device 4 detects a sound in the audible range of the sound interval.
  • a playback process for playing back the data may also be executed.
  • the playback unit 54 may transmit the data of the audible range sound in the sound section to the sound collector 3 as playback data, and the sound collector 3 may play the data.
  • users rarely pay attention to the announcement sound output by the output device 2 in an airport waiting room, a station premises, or the like. By reproducing the sound data in the audible range of the sound section, the user can listen again to the missed announcement sound.
  • the control unit 50 of the processing device 4 executes a reproduction process for reproducing the audible range of the external sound when the inaudible sound component of the external sound satisfies a preset condition.
  • a reproduction process for reproducing the audible range of the external sound when the inaudible sound component of the external sound satisfies a preset condition.
  • the reproduction section 54 may acquire audible sound data corresponding to the section ID from the section buffer 52.
  • the reproduction unit 54 may transmit the acquired audible sound data to the sound collector 3 and have the sound collector 3 reproduce it. With such a configuration, the user can quickly hear the sound.
  • a processing system includes an output device 2 as shown in FIG. 2, a sound collector 3 as shown in FIG. 4, and a processing device 104 as shown in FIG. 10, which will be described later.
  • the generation unit 20 generates a sine wave based on fundamental frequency data for each frame, the same as or similar to the above-described embodiment.
  • the generation unit 20 generates the sine wave x(t) at time t using equation (2), for example.
  • a sine wave x(t) having the fundamental frequency f 0 as a component can be generated by equation (2).
  • x(t) A sin ⁇ 2 ⁇ t ⁇ (n ⁇ f 0 +X) ⁇
  • the numerical value n is a numerical value of 1.0 or more.
  • the numerical value n is set within a range in which (n ⁇ f 0 +X) does not exceed the sampling rate of the sound collector 3.
  • the numerical value n may be set based on the resolution of the fast Fourier transform performed by the processing device 104. For example, the lower the resolution of the fast Fourier transform, the larger the numerical value n may be set.
  • the generation unit 20 After generating a sine wave for each frame, the generation unit 20 synthesizes the sine waves for each frame to generate inaudible sound data in the same manner as or similar to the embodiment described above. When generating the inaudible sound data, the generating unit 20 outputs the generated inaudible sound data to the superimposing unit 15 in the same manner as or similar to the embodiment described above.
  • the control unit 18 causes the superimposition unit 15 to superimpose the audible sound data from the delay buffer 14 and the inaudible sound data from the generation unit 20 to generate sound data, in the same way or similar to the embodiment described above. let however, in other embodiments, when superimposing the audible sound data and the inaudible sound data, the control unit 18 corrects the phase shift between the audible sound data and the inaudible sound data by phase adjustment. You may. Further, the output device 2 may further include a high pass filter (HPF) for removing noise in the audible range that occurs when audible sound data and inaudible sound data are superimposed.
  • HPF high pass filter
  • the control unit 18 causes the speaker 16 to convert the sound data from the superimposition unit 15 into sound and output it, in the same way or similar to the embodiment described above.
  • the other configurations of the output device 2 according to the other embodiments are the same as or similar to the configuration of the output device 2 according to the embodiment described above.
  • a processing device 104 includes a communication section 40, an input section 41, a notification section 42, a storage section 46, a filter section 47, and a control section 150.
  • the storage unit 46 stores keywords set in advance by the user.
  • the user sets keywords related to information that the user does not want to miss. For example, if the user does not want to miss information regarding "Flight 153,” the user sets a keyword "Flight 153" and stores it in the storage unit 46.
  • the storage unit 46 stores information about the constant X and the integer n in equation (2).
  • the control unit 150 is configured to include at least one processor, at least one dedicated circuit, or a combination thereof.
  • the processor is a general-purpose processor such as a CPU or GPU, or a dedicated processor specialized for specific processing.
  • the dedicated circuit is, for example, an FPGA or an ASIC.
  • the control unit 150 executes processing related to the operation of the processing device 4 while controlling each part of the processing device 104 .
  • the control unit 150 includes a section detection section 51, a section buffer 52, a playback section 54, a buffer 55, a first extraction section 56, a second extraction section 57, a removal section 58, an emphasis section 59, and a conversion section 58. It has a section 60 and a recognition section 61.
  • Buffer 55 may be part of storage unit 46 . The operation of the buffer 55 is executed by the processor of the control unit 150 or the like.
  • the control unit 150 receives external sound data from the sound collector 3 through the communication unit 40 in the same manner as or similar to the embodiment described above.
  • the control unit 150 inputs the received external sound data to the filter unit 47 .
  • the filter unit 47 separates the external sound data into inaudible sound data and audible sound data in the same manner as or similar to the embodiment described above.
  • the filter section 47 outputs the inaudible sound data to the section detection section 51.
  • the filter section 47 outputs the audible sound data to the section buffer 52 and the buffer 55.
  • the section detection unit 51 acquires inaudible sound data from the filter unit 47 in the same manner as or similar to the embodiment described above.
  • the section detection unit 51 detects a sound section in which a sound continues in the audible sound data, based on the inaudible sound data, as in or similar to the embodiment described above.
  • the section detection unit 51 generates a section ID when detecting a sound section, in the same way or similar to the embodiment described above.
  • the section detection unit 51 outputs sound section information and section ID to the section buffer 52 in the same manner as or similar to the embodiment described above. In another embodiment, the section detection unit 51 outputs the information on the sound section and the section ID to the buffer 55.
  • the section detecting section 51 extracts inaudible sound data in a sound section from among the inaudible sound data acquired from the filter section 47.
  • the section detection section 51 outputs the inaudible sound data and section ID of the extracted sound section to the first extraction section 56 .
  • the section detection unit 51 may acquire left inaudible sound data and right inaudible sound data from the filter unit 47. In this case, the section detection unit 51 extracts the inaudible sound data for the left side of the sound section from the data of the inaudible sound for the left side, and extracts the inaudible sound data for the right side of the sound section from the data of the right inaudible sound. May extract data.
  • the section detecting section 51 may output the left and right inaudible sound data and section ID of the extracted sound section to the first extracting section 56 .
  • the section buffer 52 acquires audible sound data from the filter section 47 and obtains sound section information and section ID from the section detection section 51, in the same way or similar to the embodiment described above. Same as or similar to the embodiment described above, the section buffer 52 selects data that may be included in the sound section from among the audible sound data obtained from the filter section 47 based on the information on the sound section obtained from the section detection section 51. Extract auditory data. The section buffer 52 stores audible sound data of the extracted sound section in association with the section ID.
  • the section buffer 52 outputs the audible sound data and section ID included in the extracted sound section to the removal unit 58.
  • the section buffer 52 may output the left and right audible sound data of the sound section and the section ID to the removal unit 58 .
  • the buffer 55 acquires audible sound data from the filter section 47.
  • the buffer 55 also acquires information on the sound interval and the interval ID from the interval detection unit 51.
  • the buffer 55 extracts audible sound data included in sections other than the sound section from the audible sound data obtained from the filter section 47 based on the sound section information obtained from the section detection section 51.
  • the buffer 55 holds data of audible sounds included in intervals other than the extracted sound interval.
  • the buffer 55 outputs the audible sound data and the section ID included in sections other than the extracted sound section to the second extraction unit 57.
  • the buffer 55 may acquire left and right audible sound data from the filter unit 47. In this case, the buffer 55 may extract left and right audible sound data included in intervals other than the sound interval. The buffer 55 may output data of left and right audible sounds included in a section other than the sound section and the section ID to the second extraction section 57.
  • the first extraction unit 56 acquires the inaudible sound data and the interval ID of the sound interval from the interval detection unit 51.
  • the first extraction unit 56 extracts the fundamental frequency of the audible sound in the sound interval from the acquired data on the inaudible sound in the sound interval.
  • the output device 2 generates inaudible sound data including information on the fundamental frequency of the audible sound using equation (2). Therefore, the inaudible data of a sound interval includes information about the fundamental frequency of the sound interval.
  • the first extraction unit 56 divides the inaudible sound data into frames having a predetermined frame length in the same way or similar to the process described above with reference to FIG. 3.
  • the first extraction unit 56 performs a fast Fourier transform on the sound data included in each frame in the same or similar manner as the process of the section detection unit 51, and generates a frequency spectrum signal for each frame. get.
  • the first extraction unit 56 performs fast Fourier transform on several thousand samples of sound data every several hundred samples.
  • the first extraction unit 56 Upon acquiring the frequency spectrum signal for each frame, the first extraction unit 56 calculates the power of the frequency spectrum signal for each frame, for example, by squaring the frequency spectrum signal. The first extraction unit 56 extracts, from among the frequency spectrum signals for each frame, a frequency spectrum signal in which the power of the frequency spectrum signal is greater than or equal to a power threshold.
  • the power threshold value may be the same as the power threshold value used by the section detection section 51, or may be set separately from the power threshold value used by the section detection section 51.
  • the first extraction unit 56 extracts fundamental frequency information from the extracted frequency spectrum signal. For example, the first extracting unit 56 extracts the fundamental frequency f 0 in equation (2) using the extracted frequency spectrum signal and information on the constant X and integer n in equation (2) stored in the storage unit 46. . The fundamental frequency extracted by the first extraction unit 56 in this manner becomes the fundamental frequency of the audible sound output by the output device 2.
  • the first extracting unit 56 outputs the extracted fundamental frequency information and section ID to the emphasizing unit 59.
  • the first extraction unit 56 may acquire the left and right inaudible sound data and the interval ID from the interval detection unit 51. In this case, the first extraction unit 56 may extract fundamental frequency information from either the left or right inaudible sound data of the sound section.
  • the second extraction unit 57 acquires the audible sound data and the section ID included in sections other than the sound section from the buffer 55.
  • the second extraction unit 57 extracts noise component information based on audible sound data included in an interval other than the acquired sound interval.
  • the possibility that the audible sound data included in the interval other than the sound interval is the audible sound data output from the output device 2 is low. Rather, the audible sound data included in intervals other than the sound interval is highly likely to be noise data. Therefore, the second extraction unit 57 extracts noise component information based on audible sound data included in intervals other than the sound interval.
  • An example of a process for extracting noise will be described below.
  • the second extraction unit 57 divides the audible sound data included in a section other than the sound section into frames having a predetermined frame length, in the same way or similar to the process described above with reference to FIG. 3.
  • the second extraction unit 57 performs a fast Fourier transform on the sound data included in each frame in the same or similar manner as the processing by the section detection unit 51, and generates a frequency spectrum signal for each frame. get.
  • the second extraction unit 57 performs fast Fourier transform on several thousand samples of sound data every several hundred samples.
  • the second extraction unit 57 acquires the frequency spectrum signal for each frame as a noise frequency spectrum signal. In other words, the second extraction unit 57 acquires the frequency spectrum signal of the noise as information on the noise component.
  • the second extractor 57 outputs the noise frequency spectrum signal and the section ID to the remover 58.
  • the second extraction unit 57 may acquire, from the buffer 55, the data of left and right audible sounds included in a section other than the sound section, and the section ID. In this case, the second extraction unit 57 extracts the information on the left noise component from the left audible sound data included in the section other than the sound section, and extracts the information on the left audible sound component from the data of the right audible sound included in the section other than the sound section. Information on the noise component for the right side may be extracted from . The second extractor 57 may output the extracted left and right noise component information, for example, the noise frequency spectrum signal and the section ID, to the removal unit 58.
  • the removal unit 58 acquires the noise frequency spectrum signal and the section ID from the second extraction unit 57 as noise component information.
  • the removal unit 58 acquires the section ID and audible sound data from the section buffer 52.
  • the removal unit 58 removes the noise component from the audible sound data acquired from the section buffer 52 based on the information on the noise component. An example of processing for removing noise components will be described below.
  • the removing unit 58 divides the audible sound data of the sound section into frames having a predetermined frame length in the same way or similar to the process described above with reference to FIG. 3.
  • the removal unit 58 performs a fast Fourier transform on the sound data included in each frame in the same or similar manner as the process performed by the section detection unit 51, and obtains a frequency spectrum signal for each frame.
  • the removing unit 58 performs fast Fourier transform on several thousand samples of sound data every several hundred samples.
  • the removal unit 58 removes noise components from the frequency spectrum signal of the audible sound for each frame based on the frequency spectrum signal of the noise acquired from the second extraction unit 57.
  • the removal unit 58 may remove noise components from the frequency spectrum signal of the audible sound for each frame using any method such as a spectral subtraction method or a Wiener filter.
  • the removing unit 58 outputs the frequency spectrum signal of the audible sound from which the noise component has been removed and the section ID to the emphasizing unit 59.
  • the removing unit 58 may obtain left and right audible sound data and the section ID of the sound section from the section buffer 52.
  • the removal unit 58 may acquire information on the left and right noise components and the section ID from the second extraction unit 57.
  • the removal unit 58 may remove the left noise component from the left audible sound data of the sound section, and may remove the right noise component from the right audible sound data of the sound section.
  • the removal unit 58 may output left and right audible sound data from which noise components have been removed, such as frequency spectrum signals, and the section ID to the emphasizing unit 59.
  • the emphasizing unit 59 acquires the frequency spectrum signal of the audible sound from which the noise component has been removed and the section ID from the removing unit 58.
  • the emphasizing unit 59 acquires the fundamental frequency information and the section ID from the first extracting unit 56 .
  • the emphasizing unit 59 emphasizes the frequency spectrum signal of the audible sound outputted by the output device 2 from among the frequency spectrum signal of the audible sound acquired from the removing unit 58 based on the fundamental frequency information acquired from the first extracting unit 56. do.
  • the frequency of the vocal cord sound source emitted from the vocal cords in the throat of a typical person is the fundamental frequency.
  • a typical human voice is generated by a spectral envelope using the vocal tract as a vocal tract filter based on the fundamental frequency of the vocal cord sound source. That is, a typical human voice is generated by convolution of an integer multiple of the fundamental frequency, that is, overtones, and a vocal tract filter. Therefore, the emphasizing unit 59 increases the power [dB] of a frequency that is an integral multiple of the fundamental frequency acquired from the first extracting unit 56 out of the frequency spectrum signal of the audible sound acquired from the removing unit 58. As described above, the fundamental frequency acquired from the first extractor 56 becomes the fundamental frequency of the sound output by the output device 2. By increasing the power [dB] of frequencies that are integral multiples of the fundamental frequency in this way, the frequency spectrum signal of the audible sound output by the output device 2 can be emphasized.
  • the emphasizing unit 59 calculates the power [dB] in the frequency range of (m ⁇ f 0 ⁇ F) as shown in FIG. increase.
  • the horizontal axis indicates frequency.
  • the vertical axis indicates the power gain of the frequency spectrum signal.
  • the integer m is an integer satisfying 1 ⁇ m ⁇ M.
  • the integer M which is the upper limit of the integer m, may be set depending on the environment in which the processing system 1 is used.
  • the fundamental frequency f 0 is the fundamental frequency of the audible sound extracted by the first extraction unit 56.
  • the constant F is set based on the frequency fluctuation of the audible sound output from the output device 2, etc.
  • the power gain at the frequency (m ⁇ f 0 ) is set to B times (B satisfies 1 ⁇ B).
  • the power gain at each frequency (m ⁇ f 0 ⁇ F) is set to 1 times.
  • the power gain is set so that the power gain linearly attenuates from frequency (m ⁇ f 0 ) to frequency (m ⁇ f 0 ⁇ F).
  • the emphasizing unit 59 After emphasizing the frequency spectrum signal of the audible sound outputted by the output device 2, the emphasizing unit 59 outputs the frequency spectrum signal of the audible sound after the emphasis and the section ID to the converting unit 60.
  • the emphasizing unit 59 may acquire the frequency spectrum signals of the left and right audible sounds from which noise components have been removed and the section ID from the removing unit 58. In this case, the emphasizing unit 59 may emphasize each of the left and right audible sound frequency spectrum signals. The emphasizing section 59 may output the emphasized left and right audible sound frequency spectrum signals and the section ID to the converting section 60 .
  • the converting unit 60 acquires the frequency spectrum signal and the section ID of the audible sound from the emphasizing unit 59.
  • the converter 60 converts the audible sound frequency spectrum signal into time domain audible sound data by inverse short time Fourier transform (ISTFT).
  • ISTFT inverse short time Fourier transform
  • the conversion unit 60 outputs the converted time domain audible sound data and the section ID to the playback unit 54 and the recognition unit 61.
  • the converting unit 60 may obtain the frequency spectrum signals of the left and right audible sounds and the section ID from the emphasizing unit 59. In this case, the conversion unit 60 converts the frequency spectrum signal of the left audible sound into time-domain left audible sound data, and converts the frequency spectrum signal of the right audible sound into time-domain right audible sound data. You can convert it to The converter 60 may output left and right audible sound data in the time domain to the playback unit 54 and the recognition unit 61 .
  • the recognition unit 61 acquires time-domain audible sound data and section ID from the conversion unit 60.
  • the recognition unit 61 acquires audible text data by performing speech recognition processing on the acquired audible sound data.
  • the recognition unit 61 determines whether the keyword stored in the storage unit 46 is included in the acquired audible sound text data. If the recognition unit 61 determines that the audible sound text data includes a keyword, it outputs a reproduction trigger and a section ID to the reproduction unit 54 .
  • the recognition unit 61 may acquire left and right audible sound data in the time domain from the conversion unit 60, and may acquire text data of left and right audible sounds. In this case, if the recognition unit 61 determines that the keyword stored in the storage unit 46 is included in either the left or right audible sound text data, the recognition unit 61 outputs the reproduction trigger and section ID to the reproduction unit 54. You may do so.
  • the playback unit 54 acquires the converted time domain audible sound data and the section ID from the conversion unit 60.
  • the playback unit 54 acquires the playback trigger and section ID from the recognition unit 61.
  • the reproduction unit 54 executes reproduction processing.
  • the playback unit 54 sets the replay flag to True.
  • the playback section 54 uses, as playback data, audible sound data having the same section ID as the section ID obtained from the recognition section 61 together with the playback trigger.
  • the reproduction section 54 transmits reproduction data to the sound collector 3 through the communication section 40 . After transmitting all the reproduction data to the sound collector 3, the reproduction unit 54 sets the replay flag to False.
  • the playback unit 54 may obtain the left and right audible sound data and the section ID in the time domain from the conversion unit 60. In this case, the playback unit 54 may use left and right audible sound data in the time domain as playback data.
  • the other configuration of the processing device 104 is the same as or similar to the configuration of the processing device 4 according to the embodiment described above.
  • step S5 The operation of the output device 2 according to another embodiment will be explained using the flowchart shown in FIG. However, in the process of step S5, the generation unit 20 generates a sine wave using equation (2).
  • FIG. 12 is a flowchart showing the flow of sound acquisition processing executed by the processing device 104 according to another embodiment. For example, when the transmission of external sound digital data from the sound collector 3 to the processing device 104 is started, the control unit 150 starts the process of step S41.
  • the processing device 104 executes the processes of steps S41 to S44 in the same or similar manner as the processes of steps S21 to S44 shown in FIG.
  • the first extraction unit 56 acquires the inaudible sound data and the interval ID of the sound interval from the interval detection unit 51.
  • the first extraction unit 56 extracts the fundamental frequency of the audible sound in the sound interval from the acquired data on the inaudible sound in the sound interval.
  • the second extraction unit 57 acquires the audible sound data and the section ID included in the section other than the sound section from the buffer 55.
  • the second extraction unit 57 extracts noise component information based on audible sound data included in an interval other than the acquired sound interval.
  • the removal unit 58 acquires the section ID and audible sound data from the section buffer 52.
  • the removal unit 58 removes the noise component from the audible sound data obtained from the section buffer 52 based on the noise component information obtained in the process of step S46.
  • the emphasizing unit 59 selects a sound with a frequency that is an integer multiple of the fundamental frequency extracted in the process of step S45, out of the frequency spectrum signal of the audible sound from which the noise component has been removed by the process of step S47. Strengthen your power.
  • the conversion unit 60 converts the frequency spectrum signal of the audible sound on which the process of step S48 was executed into time domain audible sound data.
  • the recognition unit 61 acquires audible text data by performing speech recognition processing on time domain audible sound data.
  • step S50 the recognition unit 61 determines whether the keyword stored in the storage unit 46 is included in the audible sound text data acquired in the process of step S49. If the recognition unit 61 determines that the audible sound text data includes a keyword (step S50: YES), the process proceeds to step S29 shown in FIG. 8. If the recognition unit 61 determines that the audible sound text data does not include a keyword (step S50: NO), the processing device 104 returns to the process of step S41.
  • the first extraction unit 56 of the control unit 150 extracts fundamental frequency information from the data of the sound in the inaudible range of the sound interval. Furthermore, the emphasizing unit 59 of the control unit 150 increases the power of sounds with frequencies that are integral multiples of the extracted fundamental frequency, among the audible sounds in the sound section. With such a configuration, the power of the audible sound output by the output device 2 among the audible sounds collected by the sound collector 3 can be increased. By increasing the power of the audible sound output by the output device 2, the influence of noise on the audible sound is reduced.
  • the recognition unit 61 can accurately determine whether or not the audible sound text data includes preset text data.
  • the second extraction unit 57 of the control unit 150 may extract information on noise components based on data of sounds in the audible range included in sections other than the sound sections. Further, the removal unit 58 of the control unit 150 may remove noise components from the audible sound data of the sound section. Such a configuration reduces the influence of noise on audible sounds. Since the influence of noise on the audible sound is reduced, when the audible sound is reproduced as reproduction data, the user can accurately hear the sound such as the announcement sound outputted by the output device 2.
  • each functional unit, each means, each step, etc. may be added to other embodiments so as not to be logically contradictory, or each functional unit, each means, each step, etc. of other embodiments may be added to other embodiments to avoid logical contradiction. It is possible to replace it with Furthermore, in each embodiment, it is possible to combine or divide a plurality of functional units, means, steps, etc. into one. Furthermore, the embodiments of the present disclosure described above are not limited to being implemented faithfully to each of the described embodiments, but may be implemented by combining features or omitting some features as appropriate. You can also do that.
  • the audible sound data input from the input unit 11 of the output device 2 and the audible sound data converted by the converting unit 12 are described as digital sound data.
  • the audible sound data input from the input unit 11 of the output device 2 and the audible sound data converted by the converting unit 12 may be analog sound data.
  • the input section 11 may include a microphone and the like. That is, the user may input audible sound data, which is analog data, from the microphone of the input unit 11 .
  • the extraction unit 19 may obtain audible sound data as digital data by sampling audible sound data as analog data at a preset sampling rate.
  • the sound collector 3 and the processing device 4 are described as being separate devices, as shown in FIG. 4. Further, as shown in FIG. 10, the sound collector 3 and the processing device 104 have been described as being separate devices. However, the sound collector 3 and the processing device 4 may be configured as an integrated device. Further, the sound collector 3 and the processing device 104 may be configured as an integrated device. An example of configuring the sound collector 3 and the processing device 4 as an integrated device will be described with reference to FIG. 13.
  • the sound collector 103 as shown in FIG. 13 may be an earphone.
  • the sound collector 103 is a combination of the sound collector 3 and the processing device 4.
  • the sound collector 103 can also be called a processing device.
  • the sound collector 103 includes a microphone 30, a speaker 31, a storage section 33, a filter section 47, and a control section 34.
  • the control section 34 includes an acquisition section 35 , a reproduction section 36 , a section detection section 51 , a section buffer 52 , and a pattern detection section 53 .
  • the section detecting section 51 when it detects a sound section, it outputs information on the sound section to the section buffer 52 and the pattern detecting section 53.
  • the section buffer 52 extracts audible sound data included in the sound section from among the audible sound data obtained from the filter section 47 based on the information on the sound section obtained from the section detection section 51.
  • the section buffer 52 holds audible sound data of the extracted sound section.
  • the pattern detection section 53 determines whether inaudible sound data in the extracted sound section satisfies preset conditions. When the pattern detection unit 53 determines that the condition is satisfied, it outputs a reproduction trigger to the reproduction unit 36.
  • the reproduction unit 36 when the reproduction unit 36 obtains a reproduction trigger from the pattern detection unit 53, it acquires the audible sound data of the latest sound interval from among the audible sound data of the sound intervals held in the interval buffer 52.
  • the reproduction unit 36 causes the speaker 31 to output the audible sound data of the acquired sound section.
  • FIG. 52 Other configurations and effects of the sound collector 103 are the same or similar to the sound collector 3 and processing device 4 as shown in FIG.
  • the sound collector 3 and the processing device 104 are configured as an integrated device as shown in FIG. 52, a playback section 54, a buffer 55, a first extraction section 56, a second extraction section 57, a removal section 58, an emphasis section 59, a conversion section 60, and a recognition section 61.
  • the processing device includes: A control unit is provided that detects a sound in the audible range based on the sound component in the inaudible range when external sound data including information on a sound in the audible range is included in a sound component in the inaudible range.
  • the control unit may detect a sound section in which the sound continues in the audible range based on the component of the sound in the inaudible range.
  • the control unit may hold data of sounds in an audible range in the sound section.
  • control unit may notify the user that a sound satisfying the condition has been detected.
  • the control unit executes a playback process to play back sound data in the audible range of the sound interval. good.
  • the control unit may execute a reproduction process of reproducing the sound in the audible range when the component of the sound in the inaudible range satisfies a preset condition.
  • the information on the sound in the audible range may be information on the fundamental frequency of the sound in the audible range.
  • the condition is that at least a part of the frequency spectrum signal of the inaudible sound matches a set pattern
  • the setting pattern may be generated based on preset audible sound data.
  • the information on the sound in the audible range is information on the fundamental frequency of the sound in the audible range
  • the control unit includes: extracting information on the fundamental frequency from data of the inaudible range sound in the sound interval; Among the sounds in the audible range of the sound section, the power of sounds with frequencies that are integral multiples of the fundamental frequency may be increased.
  • the control unit includes: extracting noise component information based on data of the audible range sound included in an interval other than the sound interval;
  • the noise component may be removed from the sound data in the audible range of the sound section.
  • the control unit includes: Obtaining text data of the sound in the audible range of the sound interval by performing speech recognition processing on the data of the sound in the audible range of the sound interval, If the acquired text data includes a preset keyword, a reproduction process may be performed to reproduce sound data in the audible range of the sound section.
  • the control unit includes: Obtaining text data of the sound in the audible range of the sound interval by performing speech recognition processing on the data of the sound in the audible range of the sound interval, If the acquired text data includes a preset keyword, the user may be notified that a sound including the keyword has been detected.
  • control unit may divide the external sound into data of the inaudible range sound and data of the audible range sound using the filter unit.
  • the output device includes: speaker and When the data of the sound in the audible range is received, a sound in the inaudible range that includes the information on the sound in the audible range is generated, and a sound in which the sound in the audible range and the sound in the inaudible range are superimposed is generated.
  • a control unit that outputs output through a speaker; Equipped with
  • the processing system includes: When receiving sound data in the audible range, generates a sound in the inaudible range that includes information on the sound in the audible range as a component, and outputs a sound in which the sound in the audible range and the sound in the inaudible range are superimposed.
  • an output device for a processing device that detects a sound in the audible range based on a component of the sound in the inaudible range when external sound data is acquired; including.
  • descriptions such as “first” and “second” are identifiers for distinguishing the configurations.
  • the numbers in the configurations can be exchanged.
  • the first extraction unit 56 can exchange the identifiers “first” and “second” with the second extraction unit 57.
  • the exchange of identifiers takes place simultaneously.
  • the configurations are distinguished.
  • Identifiers may be removed.
  • Configurations with removed identifiers are distinguished by codes.
  • the description of identifiers such as “first” and “second” in this disclosure should not be used to interpret the order of the configuration or to determine the existence of lower-numbered identifiers.
  • processing system 2 output device 3,103 sound collector 4,104 processing device 10 input section 11 input section 12 conversion section 13 switch 14 delay buffer 15 superposition section 16 speaker 17 storage section 18 control section 19 extraction section 20 generation section 30 microphone 31 Speaker 32 Communication section 33 Storage section 34 Control section 35 Acquisition section 36 Reproduction section 37 Storage section 40 Communication section 41 Input section 42 Notification section 43 Display section 44 Vibration section 45 Light emitting section 46 Storage section 47 Filter section 50, 150 Control section 51 Section detection section 52 Section buffer 53 Pattern detection section 54 Reproduction section 55 Buffer 56 First extraction section 57 Second extraction section 58 Removal section 59 Emphasis section 60 Conversion section 61 Recognition section

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Soundproofing, Sound Blocking, And Sound Damping (AREA)

Abstract

This processing device comprises a control unit. When data of external sound, in which information of audible-range sound is contained in the components of inaudible-range sound, is acquired, sound in the audible range is detected according to the components of the inaudible-range sound.

Description

処理装置、出力装置及び処理システムProcessing equipment, output equipment and processing systems 関連出願へのクロスリファレンスCross-reference to related applications
 本出願は、2022年9月15日に日本国に特許出願された特願2022-147250の優先権を主張するものであり、この先の出願の開示全体をここに参照のために取り込む。 This application claims priority to Japanese Patent Application No. 2022-147250 filed in Japan on September 15, 2022, and the entire disclosure of this earlier application is incorporated herein by reference.
 本開示は、処理装置、出力装置及び処理システムに関する。 The present disclosure relates to a processing device, an output device, and a processing system.
 従来、ヘッドホン又はイヤホン等の処理装置を装着した状態でユーザが周囲の音声を聞くことを可能にする技術が知られている。このような技術において、外部音と所定の語句とが一致した場合、一致した旨をヘッドホンから通知する通知手段を具備する携帯音楽再生装置が知られている(特許文献1)。 Conventionally, techniques are known that allow a user to listen to surrounding sounds while wearing a processing device such as headphones or earphones. In such technology, a portable music playback device is known that includes a notification unit that notifies you of the match from headphones when an external sound matches a predetermined phrase (Patent Document 1).
特開2001-256771号公報Japanese Patent Application Publication No. 2001-256771
 本開示の一実施形態に係る処理装置は、
 可聴域の音の情報を非可聴域の音の成分に含む外部音のデータを取得すると、前記非可聴域の音の成分によって、前記可聴域における音を検出する制御部を備える。
A processing device according to an embodiment of the present disclosure includes:
A control unit is provided that detects a sound in the audible range based on the sound component in the inaudible range when external sound data including information on a sound in the audible range is included in a sound component in the inaudible range.
 本開示の一実施形態に係る出力装置は、
 スピーカと、
 可聴域の音のデータを受け付けると、前記可聴域の音の情報を成分に含む非可聴域の音を生成し、前記可聴域の音と前記非可聴域の音とを重畳させた音を前記スピーカによって出力する制御部と、
 を備える。
An output device according to an embodiment of the present disclosure includes:
A speaker and
a control unit that, when receiving data of an audible sound, generates an inaudible sound containing information of the audible sound as a component, and outputs a sound in which the audible sound and the inaudible sound are superimposed on each other through the speaker;
Equipped with.
 本開示の一実施形態に係る処理システムは、
 可聴域の音のデータを受け付けると、前記可聴域の音の情報を成分に含む非可聴域の音を生成し、前記可聴域の音と前記非可聴域の音とを重畳させた音を出力する出力装置と、
 外部音のデータを取得すると、前記非可聴域の音の成分によって、前記可聴域における音を検出する処理装置と、
 を含む。
A processing system according to an embodiment of the present disclosure includes:
When receiving sound data in the audible range, generates a sound in the inaudible range that includes information on the sound in the audible range as a component, and outputs a sound in which the sound in the audible range and the sound in the inaudible range are superimposed. an output device for
a processing device that detects a sound in the audible range based on a component of the sound in the inaudible range when external sound data is acquired;
including.
本開示の一実施形態に係る処理システムの概略構成を示す図である。1 is a diagram showing a schematic configuration of a processing system according to an embodiment of the present disclosure. 図1に示す出力装置のブロック図である。FIG. 2 is a block diagram of the output device shown in FIG. 1. FIG. フレーム分割を説明するための図である。FIG. 3 is a diagram for explaining frame division. 図1に示す集音器及び処理装置のブロック図である。FIG. 2 is a block diagram of the sound collector and processing device shown in FIG. 1. FIG. 周波数スペクトル信号のグラフである。1 is a graph of a frequency spectrum signal. 図2に示す出力装置が実行する音出力処理の流れを示すフローチャートである。3 is a flowchart showing the flow of sound output processing executed by the output device shown in FIG. 2. FIG. 図4に示す集音器が実行する集音処理の流れを示すフローチャートである。5 is a flowchart showing the flow of sound collection processing executed by the sound collector shown in FIG. 4. FIG. 図4に示す処理装置が実行する音取得処理の流れを示すフローチャートである。5 is a flowchart showing the flow of sound acquisition processing executed by the processing device shown in FIG. 4. FIG. 図4に示す処理装置が実行する音取得処理の流れを示すフローチャートである。5 is a flowchart showing the flow of sound acquisition processing executed by the processing device shown in FIG. 4. FIG. 他の実施形態に係る処理装置のブロック図である。FIG. 3 is a block diagram of a processing device according to another embodiment. 倍音を強調する処理を説明するための図である。FIG. 3 is a diagram for explaining processing for emphasizing overtones. 他の実施形態に係る処理装置が実行する音取得処理の流れを示すフローチャートである。It is a flowchart which shows the flow of sound acquisition processing performed by the processing device concerning other embodiments. 図4に示す集音器(処理装置)の変形例のブロック図である。5 is a block diagram of a modification of the sound collector (processing device) shown in FIG. 4. FIG.
 従来の技術には、改善の余地がある。例えば、ユーザは、騒音下で、ヘッドホン又はイヤホン等の処理装置を使用する場合がある。この場合、処理装置において、騒音と検出すべき音声とが区別しにくくなる場合がある。本開示の一実施形態によれば、音を精度良く検出するための技術を提供することができる。 There is room for improvement in the conventional technology. For example, a user may use a processing device, such as headphones or earphones, in noisy conditions. In this case, it may become difficult for the processing device to distinguish between the noise and the voice to be detected. According to an embodiment of the present disclosure, a technique for detecting sound with high accuracy can be provided.
 以下、本開示に係る実施形態について、図面を参照して説明する。 Hereinafter, embodiments according to the present disclosure will be described with reference to the drawings.
 (処理システムの構成)
 図1に示すように、処理システム1は、出力装置2と、集音器3と、処理装置4とを含む。本実施形態では、集音器3及び処理装置4は、別個の装置として構成される。ただし、集音器3及び処理装置4は、後述の図13に示すように一体の装置として構成されてもよい。
(Processing system configuration)
As shown in FIG. 1, the processing system 1 includes an output device 2, a sound collector 3, and a processing device 4. In this embodiment, the sound collector 3 and the processing device 4 are configured as separate devices. However, the sound collector 3 and the processing device 4 may be configured as an integrated device as shown in FIG. 13, which will be described later.
 出力装置2は、例えば、空港の待合室又は駅構内等において用いられる。出力装置2は、建物に設けられる拡声設備の一部であってよい。 The output device 2 is used, for example, in an airport waiting room or a station premises. The output device 2 may be part of a public address system installed in a building.
 出力装置2は、可聴音を出力する。例えば、出力装置2は、飛行機又は電車の到着予定時刻等を知らせるためのアナウンス音を可聴音として出力する。可聴音は、一般的な人間が耳で聞くことができる音である。可聴音の周波数帯域すなわち可聴域は、例えば、20[Hz]~18[kHz]の帯域である。 The output device 2 outputs audible sound. For example, the output device 2 outputs an audible announcement sound for notifying the estimated arrival time of an airplane or train. Audible sounds are sounds that can be heard by the average human ear. The frequency band of the audible sound, that is, the audible range is, for example, a band of 20 [Hz] to 18 [kHz].
 出力装置2は、可聴音とともに非可聴音を出力する。非可聴音は、一般的な人間が耳で聞くことができない音である。非可聴音の周波数帯域すなわち非可聴域は、例えば、20[Hz]以下の帯域、18[kHz]~22[kHz]の帯域、22[kHz]以上の帯域である。本実施形態では、出力装置2は、18[kHz]~22[kHz]の帯域の非可聴音を出力する。 The output device 2 outputs inaudible sounds along with audible sounds. Inaudible sounds are sounds that cannot be heard by the average human ear. The frequency band of the inaudible sound, that is, the inaudible range, is, for example, a band of 20 [Hz] or less, a band of 18 [kHz] to 22 [kHz], or a band of 22 [kHz] or more. In this embodiment, the output device 2 outputs inaudible sound in a band of 18 [kHz] to 22 [kHz].
 出力装置2が出力する非可聴音の成分は、出力装置2が出力する可聴音の情報を含む。非可聴音の成分が可聴音の情報を含むことにより、後述するように、処理装置4は、騒音下でも、出力装置2が出力する可聴音を精度良く検出することができる。 The inaudible sound component output by the output device 2 includes information about the audible sound output by the output device 2. Since the inaudible sound component includes audible sound information, the processing device 4 can accurately detect the audible sound output by the output device 2 even under noisy conditions, as will be described later.
 集音器3は、例えば、イヤホンである。ただし、集音器3は、イヤホンに限定されない。集音器3は、ヘッドホン等であってもよい。集音器3は、ユーザに装着される。集音器3は、音楽等をユーザに出力することができる。集音器3は、ユーザの左側の耳部に装着されるイヤホン部と、ユーザの右側に装着されるイヤホン部とを備えてよい。 The sound collector 3 is, for example, an earphone. However, the sound collector 3 is not limited to earphones. The sound collector 3 may be headphones or the like. The sound collector 3 is worn by the user. The sound collector 3 can output music and the like to the user. The sound collector 3 may include an earphone section that is attached to the user's left ear, and an earphone section that is attached to the user's right ear.
 集音器3は、集音器3の周囲の外部音を集音する。外部音とは、集音器3の外部で発せられる音である。集音器3は、ユーザに装着されることにより、ユーザの周囲の外部音を集音する。つまり、外部音には、ユーザの周囲で発せられる音が含まれる。外部音には、ユーザ自身が発する音が含まれてよい。集音器3は、処理装置4の制御に基づいて、集音したユーザの周囲の外部音をユーザに出力する。このような構成により、ユーザは、集音器3を装着した状態で自身の周囲の外部音を聞くことができる。 The sound collector 3 collects external sounds around the sound collector 3. External sound is sound emitted outside the sound collector 3. The sound collector 3 collects external sounds around the user by being worn by the user. That is, external sounds include sounds emitted around the user. External sounds may include sounds made by the user himself. The sound collector 3 outputs the collected external sounds around the user to the user under the control of the processing device 4 . With this configuration, the user can hear external sounds around him while wearing the sound collector 3.
 処理装置4は、例えば、スマートフォン、携帯電話機、タブレット又はパーソナルコンピュータ(PC:personal computer)等である。ただし、処理装置4は、これに限定されない。 The processing device 4 is, for example, a smartphone, a mobile phone, a tablet, or a personal computer (PC). However, the processing device 4 is not limited to this.
 処理装置4は、ユーザによって操作される。ユーザは、処理装置4を操作し、集音器3の設定等をすることができる。 The processing device 4 is operated by a user. The user can operate the processing device 4 to make settings for the sound collector 3 and the like.
 処理装置4は、集音器3が集音した外部音のデータを取得する。集音器3が集音した外部音には、出力装置2が出力した可聴音及び非可聴音の他、騒音が含まれる場合がある。ここで、非可聴音は、可聴音と比較すると、自然界には少ない音である。そのため、騒音の多くは、可聴音である。したがって、出力装置2が出力した非可聴音は、騒音の影響を受けにくい。また、上述したように、出力装置2が出力する非可聴音の成分は、出力装置2が出力する可聴音の情報を含む。そのため、処理装置4は、集音器3が集音した外部音の非可聴音を解析することにより、外部音に騒音が含まれる場合であっても、出力装置2が出力した可聴音を精度良く検出することができる。 The processing device 4 acquires data on external sounds collected by the sound collector 3. The external sound collected by the sound collector 3 may include noise in addition to the audible sound and inaudible sound output by the output device 2. Here, inaudible sounds are rare in nature compared to audible sounds. Therefore, most of the noise is audible. Therefore, the inaudible sound output by the output device 2 is not easily affected by noise. Further, as described above, the inaudible sound component outputted by the output device 2 includes information about the audible sound outputted by the output device 2. Therefore, by analyzing the inaudible sounds of the external sounds collected by the sound collector 3, the processing device 4 can accurately convert the audible sounds output by the output device 2 even if the external sounds include noise. It can be detected well.
 (出力装置の構成)
 図2に示すように、出力装置2は、入力部10と、入力部11と、変換部12と、スイッチ13と、遅延バッファ14と、重畳部15と、スピーカ16と、記憶部17と、制御部18とを備える。
(Configuration of output device)
As shown in FIG. 2, the output device 2 includes an input section 10, an input section 11, a conversion section 12, a switch 13, a delay buffer 14, a superimposition section 15, a speaker 16, a storage section 17, A control unit 18 is provided.
 以下において、音のデジタルデータとは、音のアナログデータを予め設定されたサンプリングレートでサンプリングすることにより取得されるデータを意味するものとする。音のアナログデータは、マイク等により集音される音のデータを意味するものとする。 In the following, digital sound data refers to data obtained by sampling analog sound data at a preset sampling rate. Sound analog data refers to sound data collected by a microphone or the like.
 入力部10は、ユーザからテキストデータの入力を受け付け可能である。入力部10は、テキストデータの入力を受け付け可能な少なくとも1つの入力用インタフェースを含んで構成される。入力用インタフェースは、例えば、キーボード等を含む。入力部10は、他の装置からテキストデータの入力を受け付けてもよい。この場合、入力部10は、他の装置に接続可能な少なくとも1つの接続用インタフェースを含んで構成されてもよい。接続用インタフェースは、USB(Universal Serial Bus)、HDMI(登録商標)(High-Definition Multimedia Interface)、又はBluetooth(登録商標)等の規格に対応したインタフェースである。 The input unit 10 can receive text data input from the user. The input unit 10 includes at least one input interface that can accept input of text data. The input interface includes, for example, a keyboard. The input unit 10 may receive text data input from another device. In this case, the input unit 10 may be configured to include at least one connection interface connectable to other devices. The connection interface is an interface compatible with standards such as USB (Universal Serial Bus), HDMI (registered trademark) (High-Definition Multimedia Interface), or Bluetooth (registered trademark).
 入力部11は、他の装置から可聴音のデータの入力を受け付け可能である。入力部11から入力される可聴音のデータは、可聴音のデジタルデータであるものとする。入力部11は、他の装置に接続可能な少なくとも1つの接続用インタフェースを含んで構成される。接続用インタフェースは、入力部10と同じ又は類似のものであってよい。 The input unit 11 can receive input of audible sound data from other devices. It is assumed that the audible sound data inputted from the input unit 11 is audible sound digital data. The input unit 11 includes at least one connection interface connectable to other devices. The connection interface may be the same as or similar to the input section 10.
 変換部12は、制御部18の制御に基づいて、入力部10からテキストデータを取得する。変換部12は、制御部18の制御に基づいて、テキストデータを可聴音のデータに変換する。例えば、変換部12は、テキスト音声合成によって、テキストデータを可聴音のデータに変換する。テキスト音声合成に用いられる音声のデータは、記憶部17に記憶されてよい。変換後の可聴音のデータは、音のデジタルデータであるものとする。 The conversion unit 12 acquires text data from the input unit 10 under the control of the control unit 18. The converter 12 converts text data into audible sound data under the control of the controller 18 . For example, the conversion unit 12 converts text data into audible sound data by text-to-speech synthesis. Speech data used for text-to-speech synthesis may be stored in the storage unit 17. It is assumed that the audible sound data after conversion is digital sound data.
 スイッチ13は、変換部12と、入力部11と、遅延バッファ14及び制御部18との間に接続される。スイッチ13は、制御部18の制御に基づいて、入力部11と、変換部12と、遅延バッファ14及び制御部18との間の電気的な接続関係を切り替える。スイッチ13は、例えば、トランジスタ等の任意のスイッチング素子を含んで構成される。 The switch 13 is connected between the conversion section 12, the input section 11, the delay buffer 14, and the control section 18. The switch 13 switches the electrical connection relationship between the input section 11 , the conversion section 12 , the delay buffer 14 , and the control section 18 based on the control of the control section 18 . The switch 13 includes, for example, an arbitrary switching element such as a transistor.
 遅延バッファ14は、一時的な記憶メモリである。遅延バッファ14は、制御部18の制御に基づいて、スイッチ13から可聴音のデータを取得する。遅延バッファ14は、取得した可聴音のデータを所定時間保持する。所定時間は、例えば、後述の抽出部19がスイッチ13から可聴音のデータを取得してから、後述の生成部20が非可聴音のデータを重畳部15に出力するまでの時間である。遅延バッファ14は、可聴音のデータを所定時間保持した後、可聴音のデータを重畳部15に出力する。遅延バッファ14は、後述の記憶部17と同じ又は類似の構成要素を含んで構成される。遅延バッファ14は、記憶部17の一部であってもよい。 The delay buffer 14 is a temporary storage memory. The delay buffer 14 acquires audible sound data from the switch 13 under the control of the control unit 18 . The delay buffer 14 holds the acquired audible sound data for a predetermined period of time. The predetermined time is, for example, the time from when the extraction unit 19 (described later) acquires audible sound data from the switch 13 until when the generation unit 20 (described below) outputs inaudible sound data to the superimposition unit 15. After holding the audible sound data for a predetermined time, the delay buffer 14 outputs the audible sound data to the superimposing section 15 . The delay buffer 14 is configured to include the same or similar components as the storage unit 17, which will be described later. The delay buffer 14 may be part of the storage unit 17.
 重畳部15は、制御部18の制御に基づいて、遅延バッファ14から可聴音のデータを取得する。重畳部15は、制御部18の制御に基づいて、制御部18から非可聴音のデータを取得する。重畳部15は、制御部18の制御に基づいて、遅延バッファ14からの可聴音のデータと、制御部18からの非可聴音のデータとを重畳させる。重畳部15は、可聴音のデータと非可聴音のデータとを重畳させた音のデータをスピーカ16に出力する。 The superimposition unit 15 acquires audible sound data from the delay buffer 14 under the control of the control unit 18. The superimposition unit 15 acquires inaudible sound data from the control unit 18 under the control of the control unit 18 . The superimposing section 15 superimposes the audible sound data from the delay buffer 14 and the inaudible sound data from the control section 18 under the control of the control section 18 . The superimposing unit 15 outputs to the speaker 16 sound data in which audible sound data and inaudible sound data are superimposed.
 スピーカ16は、音を出力可能である。スピーカ16は、例えば、電気信号を音に変換可能な拡声器である。スピーカ16は、制御部18の制御に基づいて、重畳部15から電気信号である音のデータを取得する。スピーカ16は、制御部18の制御に基づいて、取得した音のデータを音に変換して出力する。 The speaker 16 is capable of outputting sound. The speaker 16 is, for example, a loudspeaker that can convert electrical signals into sound. The speaker 16 acquires sound data, which is an electrical signal, from the superimposing section 15 under the control of the control section 18 . Under the control of the control unit 18, the speaker 16 converts the acquired sound data into sound and outputs the sound.
 記憶部17は、少なくとも1つの半導体メモリ、少なくとも1つの磁気メモリ、少なくとも1つの光メモリ又はこれらのうちの少なくとも2種類の組み合わせを含んで構成される。半導体メモリは、例えば、RAM(Random Access Memory)又はROM(Read Only Memory)等である。RAMは、例えば、SRAM(Static Random Access Memory)又はDRAM(Dynamic Random Access Memory)等である。ROMは、例えば、EEPROM(Electrically Erasable Programmable Read Only Memory)等である。記憶部17は、主記憶装置、補助記憶装置又はキャッシュメモリとして機能してよい。記憶部17には、出力装置2の動作に用いられるデータと、出力装置2の動作によって得られたデータとが記憶される。例えば、記憶部17には、変換部12がテキスト音声合成に用いる音声のデータを記憶する。 The storage unit 17 is configured to include at least one semiconductor memory, at least one magnetic memory, at least one optical memory, or a combination of at least two of these. The semiconductor memory is, for example, RAM (Random Access Memory) or ROM (Read Only Memory). The RAM is, for example, SRAM (Static Random Access Memory) or DRAM (Dynamic Random Access Memory). The ROM is, for example, an EEPROM (Electrically Erasable Programmable Read Only Memory). The storage unit 17 may function as a main storage device, an auxiliary storage device, or a cache memory. The storage unit 17 stores data used for the operation of the output device 2 and data obtained by the operation of the output device 2. For example, the storage unit 17 stores audio data used by the conversion unit 12 for text-to-speech synthesis.
 制御部18は、少なくとも1つのプロセッサ、少なくとも1つの専用回路又はこれらの組み合わせを含んで構成される。プロセッサは、CPU(Central Processing Unit)若しくはGPU(Graphics Processing Unit)等の汎用プロセッサ又は特定の処理に特化した専用プロセッサである。専用回路は、例えば、FPGA(Field-Programmable Gate Array)又はASIC(Application Specific Integrated Circuit)等である。制御部18は、出力装置2の各部を制御しながら出力装置2の動作に関わる処理を実行する。 The control unit 18 is configured to include at least one processor, at least one dedicated circuit, or a combination thereof. The processor is a general-purpose processor such as a CPU (Central Processing Unit) or a GPU (Graphics Processing Unit), or a dedicated processor specialized for specific processing. The dedicated circuit is, for example, an FPGA (Field-Programmable Gate Array) or an ASIC (Application Specific Integrated Circuit). The control unit 18 executes processing related to the operation of the output device 2 while controlling each part of the output device 2 .
 制御部18は、スイッチ13を介して入力部11又は変換部12から可聴音のデータを取得する。制御部18は、可聴音のデータを取得すると、可聴音の情報を音の成分に含む非可聴音のデータを生成する。この処理を実行するために、本実施形態では、制御部18は、抽出部19と、生成部20とを含む。抽出部19及び生成部20の処理の詳細は、後述する。 The control unit 18 acquires audible sound data from the input unit 11 or the conversion unit 12 via the switch 13. Upon acquiring the audible sound data, the control unit 18 generates inaudible sound data that includes audible sound information as a sound component. In order to execute this process, in this embodiment, the control unit 18 includes an extraction unit 19 and a generation unit 20. Details of the processing by the extraction unit 19 and the generation unit 20 will be described later.
 <音出力処理>
 以下、出力装置2が実行する音出力処理について説明する。
<Sound output processing>
The sound output process executed by the output device 2 will be described below.
 まず、ユーザは、出力装置2から出力したいアナウンス音のテキストデータを入力部10から入力する。又は、ユーザは、出力装置2から出力したいアナウンス音である可聴音のデータを入力部11から入力する。 First, the user inputs text data of an announcement sound that he wants to output from the output device 2 from the input unit 10. Alternatively, the user inputs data of an audible sound that is an announcement sound that the user wants to output from the output device 2 from the input unit 11 .
 アナウンス音のテキストデータが入力部10から入力される場合、制御部18は、テキストデータの入力を入力部10によって受け付ける。制御部18は、変換部12によって、受け付けたテキストデータを可聴音のデータに変換する。制御部18は、スイッチ13によって、変換部12と、遅延バッファ14及び抽出部19とを電気的に接続させる。このような構成により、変換部12による変換後の可聴音のデータが、変換部12からスイッチ13を介して遅延バッファ14及び抽出部19へ出力される。 When the text data of the announcement sound is input from the input unit 10, the control unit 18 receives the input of the text data from the input unit 10. The control unit 18 causes the conversion unit 12 to convert the received text data into audible sound data. The control unit 18 electrically connects the conversion unit 12, the delay buffer 14, and the extraction unit 19 using the switch 13. With such a configuration, audible sound data converted by the converter 12 is output from the converter 12 to the delay buffer 14 and the extractor 19 via the switch 13.
 アナウンス音である可聴音のデータが入力部11から入力される場合、制御部18は、可聴音のデータの入力を入力部11によって受け付ける。制御部18は、可聴音のデータの入力を受け付けると、スイッチ13によって、入力部11と、遅延バッファ14及び抽出部19とを電気的に接続させる。このような構成により、入力部11から受け付けた可聴音のデータが、入力部11からスイッチ13を介して遅延バッファ14及び抽出部19へ出力される。 When audible sound data, which is an announcement sound, is input from the input unit 11, the control unit 18 receives the input of the audible sound data from the input unit 11. When the control section 18 receives input of audible sound data, the control section 18 electrically connects the input section 11 with the delay buffer 14 and the extraction section 19 using the switch 13 . With this configuration, audible sound data received from the input section 11 is outputted from the input section 11 to the delay buffer 14 and the extraction section 19 via the switch 13.
 抽出部19は、スイッチ13を介して入力部11又は変換部12から可聴音のデータを取得する。抽出部19は、取得した可聴音のデータから、出力装置2から出力する非可聴音の成分に含ませる可聴音の情報を抽出する。本実施形態では、抽出部19は、非可聴音の成分に含ませる可聴音の情報として、可聴音の基本周波数を抽出する。本実施形態では、抽出部19は、短時間フーリエ変換(STFT:Short-Time Fourier Transform)によって、可聴音のデータから基本周波数を抽出する。ただし、抽出部19は、任意の方法によって、可聴音のデータから基本周波数を抽出してよい。以下、抽出部19のこの処理について説明する。 The extraction unit 19 acquires audible sound data from the input unit 11 or the conversion unit 12 via the switch 13. The extraction unit 19 extracts audible sound information to be included in the inaudible sound component output from the output device 2 from the acquired audible sound data. In this embodiment, the extraction unit 19 extracts the fundamental frequency of the audible sound as the audible sound information to be included in the inaudible sound component. In this embodiment, the extraction unit 19 extracts the fundamental frequency from the audible sound data by short-time Fourier transform (STFT). However, the extraction unit 19 may extract the fundamental frequency from the audible sound data using any method. This processing by the extraction unit 19 will be explained below.
 抽出部19は、可聴音のデータを、所定のフレーム長を有するフレームに分割する。例えば、抽出部19は、数百から数千の音のサンプリングデータが1つのフレームに含まれるように可聴音のデータを分割する。抽出部19は、フレーム長の1/2~1/4の時間毎に、可聴音データをフレームに分割する。以下、図3を参照して説明する。 The extraction unit 19 divides the audible sound data into frames having a predetermined frame length. For example, the extraction unit 19 divides the audible sound data such that one frame contains sampling data of hundreds to thousands of sounds. The extraction unit 19 divides the audible sound data into frames at intervals of 1/2 to 1/4 of the frame length. This will be explained below with reference to FIG.
 図3の左側に、可聴音のデータのグラフを示す。このグラフの横軸は、時刻である。図3では、フレーム長は、フレーム長L1である。抽出部19は、フレーム長L1の1/3の時間毎に可聴音のデータをフレームに分割する。抽出部19は、フレーム長L1の1/3の時間毎に可聴音のデータをフレームに分割することにより、図3の右側に示すようなフレームFr1,Fr2,Fr3を取得する。 The left side of Figure 3 shows a graph of audible sound data. The horizontal axis of this graph is time. In FIG. 3, the frame length is frame length L1. The extraction unit 19 divides the audible sound data into frames every ⅓ of the frame length L1. The extraction unit 19 obtains frames Fr1, Fr2, and Fr3 as shown on the right side of FIG. 3 by dividing the audible sound data into frames every ⅓ of the frame length L1.
 抽出部19は、フレーム毎に、フレームに含まれる可聴音のサンプリングデータから基本周波数を抽出する。例えば、抽出部19は、フレームに含まれる可聴音のサンプリングデータの自己相関関数のピークを取る周期により、基本周波数を抽出してよい。又は、抽出部19は、ケプストラム法によって、基本周波数を抽出してもよい。 The extraction unit 19 extracts the fundamental frequency from the audible sound sampling data included in the frame for each frame. For example, the extraction unit 19 may extract the fundamental frequency based on the period of peaking the autocorrelation function of the audible sound sampling data included in the frame. Alternatively, the extraction unit 19 may extract the fundamental frequency using the cepstral method.
 抽出部19は、フレーム毎に抽出した基本周波数のデータを生成部20に出力する。 The extraction unit 19 outputs fundamental frequency data extracted for each frame to the generation unit 20.
 生成部20は、抽出部19から、フレーム毎の基本周波数のデータを取得する。生成部20は、可聴音の情報である基本周波数を音の成分に含む非可聴音のデータを生成する。 The generation unit 20 acquires fundamental frequency data for each frame from the extraction unit 19. The generation unit 20 generates inaudible sound data that includes a fundamental frequency, which is audible sound information, as a sound component.
 本実施形態では、生成部20は、フレーム毎に、基本周波数のデータに基づいて正弦波を生成する。この正弦波は、非可聴音のデータである。この正弦波を後述のように合成することにより、出力装置2から出力する非可聴音のデータが生成される。例えば、生成部20は、式(1)によって、時刻tにおける正弦波x(t)を生成する。式(1)によって、基本周波数fを成分とする正弦波x(t)を生成することができる。
   x(t)=Asin{2πt・(f+X)}   式(1)
 式(1)において、振幅Aは、可聴音のデータの強度に基づいて設定されてよい。
 式(1)において、定数Xは、処理システム1で用いられる非可聴域に基づいて設定されてよい。定数Xは、例えば、18[kHz]である。
In this embodiment, the generation unit 20 generates a sine wave for each frame based on fundamental frequency data. This sine wave is inaudible sound data. By synthesizing this sine wave as described below, inaudible sound data to be output from the output device 2 is generated. For example, the generation unit 20 generates a sine wave x(t) at time t using equation (1). According to equation (1), a sine wave x(t) having the fundamental frequency f 0 as a component can be generated.
x(t)=A sin {2πt・(f 0 +X)} Formula (1)
In equation (1), the amplitude A may be set based on the intensity of audible sound data.
In equation (1), the constant X may be set based on the inaudible range used in the processing system 1. The constant X is, for example, 18 [kHz].
 生成部20は、フレーム毎に正弦波を生成すると、フレーム毎の正弦波を合成して非可聴音データを生成する。生成部20は、フレーム端部の正弦波に窓関数を掛けた後、フレーム毎の正弦波を合成してもよい。窓関数は、例えば、ハミング窓関数等である。フレーム端部の正弦波に窓関数を掛けることにより、フレーム端部における折り返し雑音を低減することができる。 After generating a sine wave for each frame, the generation unit 20 synthesizes the sine waves for each frame to generate inaudible sound data. The generation unit 20 may multiply the sine wave at the end of the frame by a window function, and then synthesize the sine waves for each frame. The window function is, for example, a Hamming window function. By multiplying the sine wave at the end of the frame by a window function, aliasing noise at the end of the frame can be reduced.
 生成部20は、非可聴音のデータを生成すると、生成した非可聴音のデータを重畳部15に出力する。 After generating the inaudible sound data, the generating unit 20 outputs the generated inaudible sound data to the superimposing unit 15.
 制御部18は、重畳部15に、遅延バッファ14からの可聴音のデータと生成部20からの非可聴音のデータとを重畳させ、音のデータを生成させる。制御部18は、重畳部15に、生成した音のデータをスピーカ16に出力させる。制御部18は、スピーカ16に、重畳部15からの音のデータを音に変換して出力させる。 The control unit 18 causes the superimposition unit 15 to superimpose the audible sound data from the delay buffer 14 and the inaudible sound data from the generation unit 20 to generate sound data. The control unit 18 causes the superimposition unit 15 to output the generated sound data to the speaker 16. The control unit 18 causes the speaker 16 to convert the sound data from the superimposing unit 15 into sound and output it.
 (集音器の構成)
 図4に示すように、集音器3は、マイク30と、スピーカ31と、通信部32と、記憶部33と、制御部34とを備える。図4では、データの主な流れを実線で示す。
(Sound collector configuration)
As shown in FIG. 4, the sound collector 3 includes a microphone 30, a speaker 31, a communication section 32, a storage section 33, and a control section 34. In FIG. 4, the main flow of data is shown by solid lines.
 マイク30は、集音器3の周囲の外部音を集音可能である。マイク30は、左用マイクと、右用マイクとを含む。左用マイクは、集音器3に含まれるユーザの左側の耳部に装着されるイヤホン部に、含まれてよい。右用マイクは、集音器3に含まれるユーザの右側に装着されるイヤホン部に、含まれてよい。例えば、マイク30は、ステレオマイクロフォン等である。 The microphone 30 is capable of collecting external sounds around the sound collector 3. The microphone 30 includes a left microphone and a right microphone. The left microphone may be included in an earphone unit included in the sound collector 3 that is attached to the user's left ear. The right microphone may be included in the earphone section included in the sound collector 3 and worn on the right side of the user. For example, the microphone 30 is a stereo microphone or the like.
 スピーカ31は、音を出力可能である。スピーカ31は、左用スピーカと、右用スピーカとを含む。左用スピーカは、集音器3に含まれるユーザの左側の耳部に装着されるイヤホン部に、含まれてよい。右用スピーカは、集音器3に含まれるユーザの右側に装着されるイヤホン部に、含まれてよい。例えば、スピーカ31は、ステレオスピーカ等ある。 The speaker 31 is capable of outputting sound. The speaker 31 includes a left speaker and a right speaker. The left speaker may be included in an earphone unit included in the sound collector 3 that is attached to the user's left ear. The right speaker may be included in the earphone section included in the sound collector 3 and worn on the right side of the user. For example, the speaker 31 is a stereo speaker or the like.
 通信部32は、通信線を介して処理装置4と通信可能な少なくとも1つの通信モジュールを含んで構成される。通信モジュールは、通信線の規格に対応した通信モジュールである。通信線の規格は、例えば、有線通信規格であるか、又は、Bluetooth(登録商標)、赤外線及びNFC(Near Field Communication)等を含む近距離無線通信規格である。 The communication unit 32 includes at least one communication module that can communicate with the processing device 4 via a communication line. The communication module is a communication module compatible with communication line standards. The standard of the communication line is, for example, a wired communication standard or a short-range wireless communication standard including Bluetooth (registered trademark), infrared rays, NFC (Near Field Communication), and the like.
 記憶部33は、少なくとも1つの半導体メモリ、少なくとも1つの磁気メモリ、少なくとも1つの光メモリ又はこれらのうちの少なくとも2種類の組み合わせを含んで構成される。半導体メモリは、例えば、RAM又はROM等である。RAMは、例えば、SRAM又はDRAM等である。ROMは、例えば、EEPROM等である。記憶部33は、主記憶装置、補助記憶装置又はキャッシュメモリとして機能してよい。記憶部33は、集音器3の動作に用いられるデータと、集音器3の動作によって得られたデータとを記憶する。例えば、記憶部33は、システムプログラム、アプリケーションプログラム及び組み込みソフトウェア等を記憶する。 The storage unit 33 is configured to include at least one semiconductor memory, at least one magnetic memory, at least one optical memory, or a combination of at least two of these. The semiconductor memory is, for example, RAM or ROM. The RAM is, for example, SRAM or DRAM. The ROM is, for example, an EEPROM. The storage unit 33 may function as a main storage device, an auxiliary storage device, or a cache memory. The storage unit 33 stores data used for the operation of the sound collector 3 and data obtained by the operation of the sound collector 3. For example, the storage unit 33 stores system programs, application programs, embedded software, and the like.
 制御部34は、少なくとも1つのプロセッサ、少なくとも1つの専用回路又はこれらの組み合わせを含んで構成される。プロセッサは、CPU若しくはGPU等の汎用プロセッサ又は特定の処理に特化した専用プロセッサである。専用回路は、例えば、FPGA又はASIC等である。制御部34は、集音器3の各部を制御しながら集音器3の動作に関わる処理を実行する。 The control unit 34 is configured to include at least one processor, at least one dedicated circuit, or a combination thereof. The processor is a general-purpose processor such as a CPU or GPU, or a dedicated processor specialized for specific processing. The dedicated circuit is, for example, an FPGA or an ASIC. The control unit 34 executes processing related to the operation of the sound collector 3 while controlling each part of the sound collector 3.
 制御部34は、取得部35と、再生部36と、蓄積部37とを含む。蓄積部37は、記憶部33と同じ又は類似の構成要素を含んで構成される。蓄積部37の少なくとも一部は、記憶部33の一部であってもよい。蓄積部37の動作は、制御部34のプロセッサ等により実行される。 The control unit 34 includes an acquisition unit 35, a playback unit 36, and a storage unit 37. The storage section 37 is configured to include the same or similar components as the storage section 33. At least a portion of the storage section 37 may be a portion of the storage section 33. The operation of the storage section 37 is executed by the processor of the control section 34 or the like.
 取得部35は、マイク30が集音した外部音のアナログデータから、外部音のデジタルデータを取得する。例えば、取得部35は、外部音のアナログデータを予め設定されたサンプリングレートでサンプリングすることにより、外部音のデジタルデータを取得する。 The acquisition unit 35 acquires external sound digital data from the external sound analog data collected by the microphone 30. For example, the acquisition unit 35 acquires digital data of external sound by sampling analog data of external sound at a preset sampling rate.
 取得部35は、外部音のデジタルデータを再生部36に出力する。また、取得部35は、外部音のデジタルデータを処理装置4に通信部32によって送信する。 The acquisition unit 35 outputs the digital data of the external sound to the reproduction unit 36. Further, the acquisition unit 35 transmits the digital data of the external sound to the processing device 4 through the communication unit 32 .
 取得部35は、マイク30が左用マイク及び右用マイクを含む場合、左用マイクが集音した外部音のアナログデータから左用の外部音のデジタルデータを取得してよい。また、取得部35は、右用マイクが集音した外部音のアナログデータから、右用の外部音のデジタルデータを取得してよい。取得部35は、左用の外部音のデジタルデータと右用の外部音のデジタルデータとを、再生部36に出力してもよい。取得部35は、左用の外部音のデジタルデータと右用の外部音のデジタルデータとを、処理装置4に通信部32によって送信してもよい。以下、左用の外部音のデジタルデータと右用の外部音のデジタルデータとを特に区別しない場合、これらは、単に「外部音のデジタルデータ」とも記載される。 When the microphone 30 includes a left microphone and a right microphone, the acquisition unit 35 may acquire digital data of the left external sound from analog data of the external sound collected by the left microphone. Further, the acquisition unit 35 may acquire digital data of the right external sound from analog data of the external sound collected by the right microphone. The acquisition unit 35 may output the left external sound digital data and the right external sound digital data to the playback unit 36 . The acquisition unit 35 may transmit the left external sound digital data and the right external sound digital data to the processing device 4 through the communication unit 32. Hereinafter, unless the left external sound digital data and the right external sound digital data are not particularly distinguished, they will also be simply referred to as "external sound digital data."
 再生部36は、取得部35から外部音のデジタルデータを取得する。再生部36は、処理装置4から、リプレイフラグを通信部32によって受信する。 The reproduction unit 36 acquires digital data of external sound from the acquisition unit 35. The playback unit 36 receives the replay flag from the processing device 4 through the communication unit 32 .
 リプレイフラグは、後述の処理装置4によって、True又はFalseに設定される。 The replay flag is set to True or False by the processing device 4, which will be described later.
 リプレイフラグがFalseである場合、集音器3及び処理装置4は、スルーモードで動作する。スルーモードは、集音器3が集音した外部音を、処理装置4を介さずに、集音器3からユーザに出力するモードである。集音器3等がスルーモードで動作する場合、ユーザは、集音器3を装着した状態で周囲の外部音を聞くことができる。 If the replay flag is False, the sound collector 3 and processing device 4 operate in through mode. The through mode is a mode in which the external sound collected by the sound collector 3 is output from the sound collector 3 to the user without going through the processing device 4 . When the sound collector 3 and the like operate in the through mode, the user can hear surrounding external sounds while wearing the sound collector 3.
 リプレイフラグがTrueである場合、集音器3及び処理装置4は、再生モードで動作する。再生モードは、集音器3が処理装置4から取得した再生データをスピーカ31から出力するモードである。 If the replay flag is True, the sound collector 3 and the processing device 4 operate in playback mode. The reproduction mode is a mode in which reproduction data acquired by the sound collector 3 from the processing device 4 is output from the speaker 31.
 再生部36は、リプレイフラグがFalseである場合、取得部35から取得した外部音のデジタルデータをスピーカ31に出力させる。再生部36は、取得部35から左用及び右用の外部音のデジタルデータを取得した場合、左用の外部音のデジタルデータをスピーカ31の左用スピーカに出力させ、右用の外部音のデジタルデータをスピーカ31の右用のスピーカに出力させてよい。 If the replay flag is False, the reproduction unit 36 causes the speaker 31 to output the digital data of the external sound acquired from the acquisition unit 35. When the reproduction unit 36 acquires the left and right external sound digital data from the acquisition unit 35, the reproduction unit 36 outputs the left external sound digital data to the left speaker of the speaker 31, and outputs the right external sound digital data. It may be output to the right speaker of the speaker 31.
 再生部36は、リプレイフラグがTrueである場合、蓄積部37に蓄積された再生データをスピーカ31に出力させる。 If the replay flag is True, the playback unit 36 causes the speaker 31 to output the playback data accumulated in the storage unit 37.
 蓄積部37は、処理装置4から再生データを通信部32によって受信する。蓄積部37は、受信した再生データを蓄積部37に蓄積する。 The storage unit 37 receives playback data from the processing device 4 through the communication unit 32. The storage unit 37 stores the received reproduction data in the storage unit 37 .
 蓄積部37は、左用の再生データ及び右用の再生データを処理装置4から受信し、これらを蓄積部37に蓄積してもよい。この場合、再生部36は、蓄積部37に蓄積された左用の再生データをスピーカ31の左用スピーカに出力させ、蓄積部37に蓄積された右用の再生データをスピーカ31の右用のスピーカに出力させてよい。 The storage unit 37 may receive the left playback data and the right playback data from the processing device 4 and store them in the storage unit 37. In this case, the playback unit 36 outputs the left playback data stored in the storage unit 37 to the left speaker of the speaker 31, and outputs the right playback data stored in the storage unit 37 to the right speaker of the speaker 31. You can output it.
 (処理装置の構成)
 図4に示すように、処理装置4は、通信部40と、入力部41と、通知部42と、記憶部46と、フィルタ部47と、制御部50とを備える。
(Configuration of processing device)
As shown in FIG. 4, the processing device 4 includes a communication section 40, an input section 41, a notification section 42, a storage section 46, a filter section 47, and a control section 50.
 通信部40は、通信線を介して集音器3と通信可能な少なくとも1つの通信モジュールを含んで構成される。通信モジュールは、通信線の規格に対応した通信モジュールである。通信線の規格は、例えば、有線通信規格であるか、又は、Bluetooth(登録商標)、赤外線及びNFC等を含む近距離無線通信規格である。 The communication unit 40 is configured to include at least one communication module that can communicate with the sound collector 3 via a communication line. The communication module is a communication module compatible with communication line standards. The standard of the communication line is, for example, a wired communication standard or a short-range wireless communication standard including Bluetooth (registered trademark), infrared rays, NFC, and the like.
 入力部41は、ユーザからの入力を受け付け可能である。入力部41は、ユーザからの入力を受け付け可能な少なくとも1つの入力用インタフェースを含んで構成される。入力用インタフェースは、例えば、物理キー、静電容量キー、ポインティングデバイス、表示部43のディスプレイと一体的に設けられたタッチスクリーン又はマイク等である。 The input unit 41 can accept input from the user. The input unit 41 includes at least one input interface that can accept input from a user. The input interface is, for example, a physical key, a capacitive key, a pointing device, a touch screen provided integrally with the display of the display unit 43, a microphone, or the like.
 通知部42は、ユーザに情報を通知可能である。通知部42は、表示部43と、振動部44と、発光部45とを有する。ただし、通知部42が有する構成要素は、これらに限定されない。通知部42は、ユーザに情報を通知可能な任意の構成要素を有してよい。 The notification unit 42 can notify the user of information. The notification section 42 includes a display section 43, a vibration section 44, and a light emitting section 45. However, the components included in the notification unit 42 are not limited to these. The notification unit 42 may include any component that can notify the user of information.
 表示部43は、データを表示可能である。表示部43は、データを表示することにより、データに応じた情報をユーザに通知する。表示部43は、例えば、ディスプレイ等である。ディスプレイは、例えば、LCD(Liquid Crystal Display)又は有機EL(Electro Luminescence)ディスプレイ等である。 The display unit 43 can display data. The display unit 43 notifies the user of information according to the data by displaying the data. The display unit 43 is, for example, a display. The display is, for example, an LCD (Liquid Crystal Display) or an organic EL (Electro Luminescence) display.
 振動部44は、処理装置4を振動可能である。振動部44は、処理装置4を振動させることにより、振動態様に応じた情報をユーザに通知する。振動部44は、例えば、圧電素子等の振動素子を含んで構成される。 The vibration unit 44 can vibrate the processing device 4. The vibrating unit 44 notifies the user of information according to the vibration mode by vibrating the processing device 4. The vibrating section 44 includes, for example, a vibrating element such as a piezoelectric element.
 発光部45は、発光可能である。発光部45は、発光することにより、発光態様に応じた情報をユーザに通知する。発光部45は、例えば、LED(Light Emitting Diode)等を含んで構成される。 The light emitting section 45 is capable of emitting light. The light emitting unit 45 notifies the user of information according to the light emission mode by emitting light. The light emitting unit 45 includes, for example, an LED (Light Emitting Diode).
 記憶部46は、少なくとも1つの半導体メモリ、少なくとも1つの磁気メモリ、少なくとも1つの光メモリ又はこれらのうちの少なくとも2種類の組み合わせを含んで構成される。半導体メモリは、例えば、RAM又はROM等である。RAMは、例えば、SRAM又はDRAM等である。ROMは、例えば、EEPROM等である。記憶部46は、主記憶装置、補助記憶装置又はキャッシュメモリとして機能してよい。記憶部46は、処理装置4の動作に用いられるデータと、処理装置4の動作によって得られたデータとを記憶する。例えば、記憶部46は、システムプログラム、アプリケーションプログラム及び組み込みソフトウェア等を記憶する。例えば、記憶部46は、後述の設定パターンを記憶する。 The storage unit 46 is configured to include at least one semiconductor memory, at least one magnetic memory, at least one optical memory, or a combination of at least two of these. The semiconductor memory is, for example, RAM or ROM. The RAM is, for example, SRAM or DRAM. The ROM is, for example, an EEPROM. The storage unit 46 may function as a main storage device, an auxiliary storage device, or a cache memory. The storage unit 46 stores data used for the operation of the processing device 4 and data obtained by the operation of the processing device 4. For example, the storage unit 46 stores system programs, application programs, embedded software, and the like. For example, the storage unit 46 stores a setting pattern described below.
 フィルタ部47は、例えば、可聴音のデータのみが通過可能な可聴域用のバンドパスフィルタと、非可聴音のデータのみが通過可能な非可聴域用のバンドパスフィルタとを含んで構成される。可聴域用のバンドパスフィルタは、例えば、20[Hz]~10[kHz]の音のデータのみが通過可能である。非可聴域用のバンドパスフィルタは、例えば、18[kHz]~22[kHz]の音のデータが通過可能である。制御部50は、集音器3から外部音のデジタルデータを通信部40によって受信すると、フィルタ部47によって、受信した外部音のデジタルデータを、非可聴音のデータと、可聴音のデータとに分ける。フィルタ部47は、非可聴音のデータを区間検出部51及びパターン検出部53に出力する。フィルタ部47は、可聴音のデータを、区間バッファ52に出力する。 The filter unit 47 is configured to include, for example, an audible bandpass filter through which only audible sound data can pass, and an inaudible bandpass filter through which only inaudible sound data can pass. . A bandpass filter for the audible range can pass only sound data of, for example, 20 [Hz] to 10 [kHz]. The bandpass filter for the inaudible range can pass sound data of, for example, 18 [kHz] to 22 [kHz]. When the communication unit 40 receives the external sound digital data from the sound collector 3, the control unit 50 divides the received external sound digital data into inaudible sound data and audible sound data using the filter unit 47. Separate. The filter section 47 outputs the inaudible sound data to the section detection section 51 and the pattern detection section 53. The filter section 47 outputs audible sound data to the section buffer 52.
 フィルタ部47は、集音器3からの外部音のデジタルデータが左用及び右用の外部音のデジタルデータを含む場合、制御部50の制御に基づいて、左用の外部音のデジタルデータを、左用の非可聴音のデータと左用の可聴音のデータとに分けてよい。また、フィルタ部47は、制御部50の制御に基づいて、右用の外部音のデジタルデータを、右用の非可聴音のデータと右用の可聴音のデータとに分けてよい。フィルタ部47は、左用及び右用の非可聴音のデータを区間検出部51及びパターン検出部53に出力してよい。フィルタ部47は、左用及び右用の可聴音のデータを、区間バッファ52に出力してよい。以下、左用の可聴音のデータと右用の可聴音のデータとを特に区別しない場合、これらは、単に「可聴音のデータ」とも記載される。また、左用の非可聴音のデータと右用の非可聴音のデータとを特に区別しない場合、これらは、単に「非可聴音のデータ」とも記載される。 When the digital data of the external sound from the sound collector 3 includes the digital data of the external sound for the left and the right, the filter unit 47 converts the digital data of the external sound for the left into the digital data of the external sound for the left based on the control of the control unit 50. The data can be divided into inaudible sound data for the left side and audible sound data for the left side. Further, the filter unit 47 may divide the digital data of the external sound for the right into inaudible sound data for the right and audible sound data for the right based on the control of the control unit 50. The filter section 47 may output the data of the left and right inaudible sounds to the section detection section 51 and the pattern detection section 53. The filter unit 47 may output left and right audible sound data to the section buffer 52 . Hereinafter, if left audible sound data and right audible sound data are not particularly distinguished, they will also be simply referred to as "audible sound data." Further, when the left inaudible sound data and the right inaudible sound data are not particularly distinguished, they are also simply described as "inaudible sound data."
 制御部50は、少なくとも1つのプロセッサ、少なくとも1つの専用回路又はこれらの組み合わせを含んで構成される。プロセッサは、CPU若しくはGPU等の汎用プロセッサ又は特定の処理に特化した専用プロセッサである。専用回路は、例えば、FPGA又はASIC等である。制御部50は、処理装置4の各部を制御しながら処理装置4の動作に関わる処理を実行する。 The control unit 50 is configured to include at least one processor, at least one dedicated circuit, or a combination thereof. The processor is a general-purpose processor such as a CPU or GPU, or a dedicated processor specialized for specific processing. The dedicated circuit is, for example, an FPGA or an ASIC. The control unit 50 executes processing related to the operation of the processing device 4 while controlling each part of the processing device 4 .
 制御部50は、区間検出部51と、区間バッファ52と、パターン検出部53と、再生部54とを有する。区間バッファ52は、記憶部46の一部であってもよい。区間バッファ52の動作は、制御部50のプロセッサ等により実行される。 The control section 50 includes a section detection section 51, a section buffer 52, a pattern detection section 53, and a reproduction section 54. The section buffer 52 may be part of the storage section 46. The operation of the section buffer 52 is executed by the processor of the control unit 50 or the like.
 区間検出部51は、フィルタ部47から、非可聴音のデータを取得する。区間検出部51は、非可聴音のデータに基づいて、可聴音のデータにおいて音が続く音区間を検出する。上述したように、出力装置2は、可聴音の情報を成分に含む非可聴音のデータを生成する。そのため、可聴音のデータに音区間がある場合、音区間に対応する非可聴のデータには、雑音以外の音のデータが存在する。そこで、本実施形態では、区間検出部51は、非可聴音のデータに雑音以外の音のデータが含まれるか否かを判定することにより、音区間を検出する。以下、区間検出部51の処理の一例を説明する。 The section detection section 51 acquires inaudible sound data from the filter section 47. The section detection unit 51 detects a sound section in which a sound continues in the audible sound data based on the inaudible sound data. As described above, the output device 2 generates inaudible sound data that includes audible sound information as a component. Therefore, when audible sound data includes a sound interval, inaudible data corresponding to the sound interval includes sound data other than noise. Therefore, in the present embodiment, the section detection unit 51 detects a sound section by determining whether or not the inaudible sound data includes sound data other than noise. An example of the processing of the section detection unit 51 will be described below.
 区間検出部51は、非可聴音のデータを、所定のフレーム長を有するフレームに分割する。例えば、区間検出部51は、数百から数千の音のサンプリングデータが1つのフレームに含まれるように非可聴音のデータを分割する。区間検出部51は、図3を参照して上述した処理と同じ又は類似に、フレーム長の1/2~1/4の時間毎に、非可聴音のデータをフレームに分割する。 The section detection unit 51 divides the inaudible sound data into frames having a predetermined frame length. For example, the section detection unit 51 divides the inaudible sound data such that one frame includes sampling data of hundreds to thousands of sounds. The section detection unit 51 divides the inaudible sound data into frames at intervals of 1/2 to 1/4 of the frame length in the same or similar manner to the process described above with reference to FIG.
 区間検出部51は、非可聴音のデータをフレームに分割すると、以下の処理によってフレーム毎に、フレームに雑音以外の音のデータが含まれるか否かを判定する。 When the section detection unit 51 divides the inaudible sound data into frames, it determines for each frame whether or not the frame includes sound data other than noise by the following process.
 区間検出部51は、フレーム毎に含まれる音のデータを高速フーリエ変換(FFT:Fast Fourier Transform)し、フレーム毎の周波数スペクトル信号を取得する。この周波数スペクトル信号は、後述の図5に示すような周波数スペクトル信号sg1となる。この周波数スペクトル信号は、出力装置2の抽出部19によって可聴音のデータから抽出された基本周波数成分のスペクトル信号に対応する。区間検出部51は、例えば周波数スペクトル信号を二乗することにより、フレーム毎の周波数スペクトル信号のパワーを算出する。 The section detection unit 51 performs Fast Fourier Transform (FFT) on the sound data included in each frame, and obtains a frequency spectrum signal for each frame. This frequency spectrum signal becomes a frequency spectrum signal sg1 as shown in FIG. 5, which will be described later. This frequency spectrum signal corresponds to the spectrum signal of the fundamental frequency component extracted from the audible sound data by the extraction unit 19 of the output device 2. The section detection unit 51 calculates the power of the frequency spectrum signal for each frame by squaring the frequency spectrum signal, for example.
 区間検出部51は、フレーム毎の周波数スペクトル信号のパワーがパワー閾値以上であるか否かを判定する。パワー閾値は、固定値として予め設定されてよい。又は、区間検出部51は、フレーム毎にパワー閾値を設定してもよい。この場合、区間検出部51は、フレーム毎の雑音パワーの統計的推定に基づいて、フレーム毎のパワー閾値を算出してよい。統計的推定は、例えば、平均値又は分散値等である。 The section detection unit 51 determines whether the power of the frequency spectrum signal for each frame is greater than or equal to the power threshold. The power threshold may be set in advance as a fixed value. Alternatively, the section detection unit 51 may set a power threshold for each frame. In this case, the section detection unit 51 may calculate the power threshold for each frame based on the statistical estimation of the noise power for each frame. The statistical estimation is, for example, an average value or a variance value.
 区間検出部51は、周波数スペクトル信号のパワーがパワー閾値以上であると判定した場合、周波数スペクトル信号に対応するフレームには、雑音以外の音のデータが含まれると判定する。区間検出部51は、周波数スペクトル信号のパワーがパワー閾値を下回ると判定した場合、周波数スペクトル信号に対応するフレームには、雑音以外の音のデータが含まれないと判定する。 If the section detection unit 51 determines that the power of the frequency spectrum signal is equal to or greater than the power threshold, it determines that the frame corresponding to the frequency spectrum signal includes sound data other than noise. When determining that the power of the frequency spectrum signal is less than the power threshold, the section detection unit 51 determines that the frame corresponding to the frequency spectrum signal does not include sound data other than noise.
 区間検出部51は、雑音以外の音のデータが含まれると判定したフレームが続く区間を、音区間として検出する。区間検出部51は、ハングオーバ処理を実行することにより、音区間を検出してもよい。 The section detection unit 51 detects, as a sound section, a section in which a frame determined to include sound data other than noise continues. The section detection unit 51 may detect a sound section by executing hangover processing.
 区間検出部51は、他の方法によって、音区間を検出してもよい。例えば、区間検出部51は、ガウシアン混合モデル(GMM:Gaussian Mixture Model)によって、雑音のクラスタと雑音以外の音のデータのクラスタとを分けることにより、音区間を検出してもよい。 The interval detection unit 51 may detect the sound interval using other methods. For example, the section detection unit 51 may detect a sound section by separating clusters of noise from clusters of sound data other than noise using a Gaussian Mixture Model (GMM).
 区間検出部51は、フィルタ部47から左用の非可聴音のデータ及び右用の非可聴音のデータを取得してよい。この場合、区間検出部51は、左用及び右用の両方の非可聴音のデータに基づいて、可聴音のデータにおいて音が続く音区間を検出してもよい。例えば、上述と同じ又は類似にして、区間検出部51は、左用の非可聴音のデータをフレーム分割して左用のフレームを取得し、左用のフレーム毎の周波数スペクトル信号を取得する。また、区間検出部51は、右用の非可聴音のデータをフレーム分割して右用のフレームを取得し、右用のフレーム毎の周波数スペクトル信号を取得する。上述と同じ又は類似にして、区間検出部51は、左用及び右用のそれぞれのフレーム毎の周波数スペクトル信号のパワーがパワー閾値以上であるか否かを判定する。ここで、周波数スペクトル信号のパワーがパワー閾値を超えるフレームは、「Trueフレーム」と記載される。区間検出部51は、左用及び右用のフレームの何れか一方がTrueフレームであるか否かを判定する。区間検出部51は、左用及び右用のフレームの何れか一方がTrueフレームであると判定した場合、左用及び右用に対応する非可聴音のフレームには、雑音以外の音のデータが含まれると判定する。ここで、左用及び右用に対応する非可聴音のフレームは、左用の非可聴音のフレームと右用の非可聴音のフレームとを1つのフレームとみなしたものである。一方、区間検出部51は、左用及び右用のフレームの何れか一方がTrueフレームであると判定しない場合、左用及び右用に対応する非可聴音のフレームには、雑音以外の音のデータが含まれないと判定する。上述と同じ又は類似に、区間検出部51は、雑音以外の音のデータが含まれると判定したフレームが続く区間を、音区間として検出する。ただし、処理装置4への設定によっては、区間検出部51は、左用及び右用のフレームの両方がTrueフレームであるか否かを判定してもよい。この場合、区間検出部51は、左用及び右用のフレームの両方がTrueフレームであると判定した場合、左用及び右用に対応する非可聴音のフレームには、雑音以外の音のデータが含まれると判定する。一方、区間検出部51は、左用及び右用のフレームの両方がTrueフレームであると判定しない場合、左用及び右用に対応する非可聴音のフレームには、雑音以外の音のデータが含まれないと判定する。 The section detection unit 51 may acquire left inaudible sound data and right inaudible sound data from the filter unit 47. In this case, the section detection unit 51 may detect a sound section in which a sound continues in the audible sound data based on both the left and right inaudible sound data. For example, in the same manner as or similar to the above, the section detection unit 51 divides the left inaudible sound data into frames to obtain left frames, and obtains a frequency spectrum signal for each left frame. Furthermore, the section detection unit 51 divides the right inaudible sound data into frames to obtain right frames, and obtains a frequency spectrum signal for each right frame. The same as or similar to the above, the section detection unit 51 determines whether the power of the frequency spectrum signal for each frame for the left and right is equal to or greater than the power threshold. Here, a frame in which the power of the frequency spectrum signal exceeds the power threshold is described as a "True frame." The section detection unit 51 determines whether one of the left and right frames is a True frame. When the section detection unit 51 determines that either one of the left and right frames is a True frame, the inaudible sound frames corresponding to the left and right frames include sound data other than noise. It is determined that Here, the inaudible sound frames corresponding to the left and right inaudible sounds are obtained by considering the left inaudible sound frame and the right inaudible sound frame as one frame. On the other hand, if the section detection unit 51 does not determine that either one of the left and right frames is a True frame, the inaudible sound frames corresponding to the left and right frames contain sound data other than noise. It is determined that it is not included. Same as or similar to the above, the section detection unit 51 detects, as a sound section, a section in which a frame determined to include sound data other than noise continues. However, depending on the settings for the processing device 4, the section detection unit 51 may determine whether both the left frame and the right frame are True frames. In this case, if the section detection unit 51 determines that both the left and right frames are True frames, the inaudible sound frames corresponding to the left and right frames include sound data other than noise. It is determined that the On the other hand, if the section detection unit 51 does not determine that both the left and right frames are True frames, the inaudible sound frames corresponding to the left and right frames include sound data other than noise. It is determined that there is no.
 区間検出部51は、音区間を検出すると、区間IDを生成する。区間IDは、音区間を一意に識別可能な識別情報である。区間検出部51は、音区間の情報及び区間IDを、区間バッファ52及びパターン検出部53に出力する。音区間の情報は、音区間の始点及び終点の情報を含む。音区間の始点及び終点は、例えば、時刻等によって特定される。 When the section detection unit 51 detects a sound section, it generates a section ID. The section ID is identification information that can uniquely identify a sound section. The section detection section 51 outputs the information on the sound section and the section ID to the section buffer 52 and the pattern detection section 53. The sound interval information includes information on the start point and end point of the sound interval. The start point and end point of the sound section are specified, for example, by time or the like.
 区間バッファ52は、フィルタ部47から、可聴音のデータを取得する。また、区間バッファ52は、区間検出部51から、音区間の情報及び区間IDを取得する。区間バッファ52は、区間検出部51から取得した音区間の情報に基づいて、フィルタ部47から取得した可聴音のデータのうちから、音区間に含まれる可聴音のデータを抽出する。区間バッファ52は、抽出した音区間の可聴音のデータを区間IDに対応付けて保持する。 The section buffer 52 acquires audible sound data from the filter section 47. Furthermore, the section buffer 52 acquires information on the sound section and the section ID from the section detection section 51. The section buffer 52 extracts audible sound data included in the sound section from among the audible sound data obtained from the filter section 47 based on the sound section information obtained from the section detection section 51. The section buffer 52 stores audible sound data of the extracted sound section in association with the section ID.
 区間バッファ52は、フィルタ部47から、左用及び右用の可聴音のデータを取得してもよい。この場合、区間バッファ52は、音区間の左用及び右用の可聴音のデータを抽出して保持してもよい。 The section buffer 52 may acquire left and right audible sound data from the filter unit 47. In this case, the section buffer 52 may extract and hold audible sound data for the left and right sections of the sound section.
 区間バッファ52は、音区間の可聴音のデータの保持を開始してから所定時間が経過すると、保持しているその音区間の可聴音データを削除してよい。所定時間は、区間バッファ52に保持可能なデータ量に基づいて設定されてよい。 When a predetermined period of time has elapsed since the section buffer 52 started holding the audible sound data of the sound section, the section buffer 52 may delete the held audible sound data of the sound section. The predetermined time may be set based on the amount of data that can be held in the section buffer 52.
 パターン検出部53は、フィルタ部47から、非可聴音のデータを取得する。また、パターン検出部53は、区間検出部51から、音区間の情報及び区間IDを取得する。 The pattern detection unit 53 acquires inaudible sound data from the filter unit 47. Furthermore, the pattern detection unit 53 acquires information on the sound interval and the interval ID from the interval detection unit 51.
 パターン検出部53は、区間検出部51から音区間の情報及び区間IDを取得すると、フィルタ部47から取得した非可聴音データのうちから、音区間の非可聴音データを抽出する。 When the pattern detection unit 53 acquires the sound interval information and the interval ID from the interval detection unit 51, it extracts the inaudible sound data of the sound interval from among the inaudible sound data acquired from the filter unit 47.
 パターン検出部53は、抽出した音区間の非可聴音の成分が予め設定された条件を満たすか否かを判定する。本実施形態では、この条件は、音区間の非可聴音のデータの周波数スペクトル信号の少なくとも一部が後述の設定パターンと一致するとの条件である。以下、パターン検出部53の処理について説明する。 The pattern detection unit 53 determines whether the inaudible sound component of the extracted sound section satisfies preset conditions. In this embodiment, this condition is that at least a portion of the frequency spectrum signal of the inaudible sound data in the sound interval matches a setting pattern described below. The processing of the pattern detection section 53 will be explained below.
 パターン検出部53は、抽出した音区間の非可聴音のデータを、所定のフレーム長を有するフレームに分割する。例えば、パターン検出部53は、数百から数千の音のサンプリングデータが1つのフレームに含まれるように非可聴音のデータを分割する。パターン検出部53は、図3を参照して上述した処理と同じ又は類似に、フレーム長の1/2~1/4の時間毎に、非可聴音のデータをフレームに分割する。 The pattern detection unit 53 divides the extracted inaudible sound data of the sound section into frames having a predetermined frame length. For example, the pattern detection unit 53 divides the inaudible sound data such that one frame contains sampling data of hundreds to thousands of sounds. The pattern detection unit 53 divides the inaudible sound data into frames at intervals of 1/2 to 1/4 of the frame length in the same or similar manner to the process described above with reference to FIG.
 パターン検出部53は、フレーム毎に含まれる音のデジタルデータを高速フーリエ変換し、フレーム毎の周波数スペクトル信号を取得する。この周波数スペクトル信号は、出力装置2の抽出部19によって可聴音のデータから抽出された基本周波数成分のスペクトル信号に対応する。 The pattern detection unit 53 performs fast Fourier transform on the digital sound data included in each frame, and obtains a frequency spectrum signal for each frame. This frequency spectrum signal corresponds to the spectrum signal of the fundamental frequency component extracted from the audible sound data by the extraction unit 19 of the output device 2.
 パターン検出部53は、音区間の非可聴音データの周波数スペクトル信号の少なくとも一部の形状が設定パターンと一致するか否かを判定する。パターン検出部53は、2次元相互相関の演算によって、音区間の非可聴音データの周波数スペクトル信号の少なくとも一部の形状が設定パターンと一致するか否かを判定してよい。 The pattern detection unit 53 determines whether the shape of at least a portion of the frequency spectrum signal of the inaudible sound data in the sound interval matches the set pattern. The pattern detection unit 53 may determine whether the shape of at least a portion of the frequency spectrum signal of the inaudible sound data in the sound interval matches the set pattern by calculating a two-dimensional cross-correlation.
 設定パターンは、例えば、予め設定された可聴音のデータに基づいて生成される。例えば、設定パターンは、予め設定された可聴音の基本周波数と、当該可聴音の長さと、当該可聴音の音速度とによって生成される。設定パターンは、予め設定された可聴音のデータを基本周波数成分のスペクトル信号に変換した場合の当該スペクトル信号の形状に基づいて生成されてもよい。例えば、出力装置2が空港の待合室において用いられるものとする。また、ユーザは、「153便」に関する情報を聞き逃したくないものとする。この場合、設定パターンは、出力装置2が出力する「153便」との可聴音のデータに基づいて生成される。 The setting pattern is generated based on preset audible sound data, for example. For example, the setting pattern is generated based on a preset fundamental frequency of an audible sound, the length of the audible sound, and the sound speed of the audible sound. The setting pattern may be generated based on the shape of a spectral signal of a fundamental frequency component when preset audible sound data is converted into a spectral signal of fundamental frequency components. For example, assume that the output device 2 is used in a waiting room at an airport. Further, it is assumed that the user does not want to miss the information regarding "Flight 153." In this case, the setting pattern is generated based on the audible sound data of "flight 153" outputted by the output device 2.
 例えば、図5には、周波数スペクトル信号sg1のグラフを示す。周波数スペクトル信号sg1は、パターン検出部53が非可聴音のデータを高速フーリエ変換することにより取得したものである。図5において、横軸は、時刻を示す。縦軸は、周波数を示す。図5の上側には、参考として可聴音のデジタルデータを示す。図5では、時刻t1から時刻t2までの区間が音区間である。パターン検出部53は、周波数スペクトル信号sg1のうちの、点線部分に含まれる周波数スペクトル信号sg2が設定パターンと一致すると判定する。 For example, FIG. 5 shows a graph of the frequency spectrum signal sg1. The frequency spectrum signal sg1 is obtained by the pattern detection unit 53 performing fast Fourier transform on inaudible sound data. In FIG. 5, the horizontal axis indicates time. The vertical axis indicates frequency. The upper part of FIG. 5 shows digital data of audible sounds for reference. In FIG. 5, the section from time t1 to time t2 is a sound section. The pattern detection unit 53 determines that the frequency spectrum signal sg2 included in the dotted line portion of the frequency spectrum signal sg1 matches the set pattern.
 パターン検出部53は、音区間の非可聴音の周波数スペクトル信号の少なくとも一部の形状が設定パターンと一致すると判定した場合、再生トリガ及び区間IDを再生部54に出力する。 When the pattern detection unit 53 determines that the shape of at least a portion of the frequency spectrum signal of the inaudible sound in the sound interval matches the set pattern, the pattern detection unit 53 outputs a reproduction trigger and the interval ID to the reproduction unit 54.
 パターン検出部53は、フィルタ部47から左用の非可聴音のデータ及び右用の非可聴音のデータを取得してよい。この場合、パターン検出部53は、音区間における左用の非可聴音及び右用の非可聴音の両方の成分が予め設定された条件を満たすか否かを判定してもよい。 The pattern detection unit 53 may acquire left inaudible sound data and right inaudible sound data from the filter unit 47. In this case, the pattern detection unit 53 may determine whether both components of the left inaudible sound and the right inaudible sound in the sound section satisfy a preset condition.
 再生部54は、パターン検出部53から、再生トリガ及び区間IDを取得する。再生部54は、再生トリガを取得すると、再生処理を実行する。再生処理では、再生部54は、区間バッファ52から区間IDに対応付けられた可聴音のデータを取得する。再生部54は、リプレイフラグをTrueに設定するとともに、取得した可聴音のデータを再生データとして集音器3に通信部40によって送信する。再生部54は、全ての再生データを集音器3に送信すると、リプレイフラグをFalseに設定する。 The playback unit 54 acquires the playback trigger and section ID from the pattern detection unit 53. The playback unit 54 executes playback processing upon acquiring the playback trigger. In the playback process, the playback unit 54 acquires audible sound data associated with the section ID from the section buffer 52. The reproduction unit 54 sets the replay flag to True, and transmits the acquired audible sound data to the sound collector 3 as reproduction data through the communication unit 40. After transmitting all the reproduction data to the sound collector 3, the reproduction unit 54 sets the replay flag to False.
 再生部54は、区間バッファ52が音区間の左用及び右用の可聴音のデータを保持する場合、左用の可聴音のデータを左用の再生データとして集音器3に送信してもよい。また、再生部54は、右用の可聴音のデータを右用の再生データとして集音器3に送信してもよい。 When the section buffer 52 holds left and right audible sound data of the sound section, the playback unit 54 may transmit the left audible sound data to the sound collector 3 as left playback data. Furthermore, the reproduction unit 54 may transmit data of the audible sound for the right to the sound collector 3 as reproduction data for the right.
 <通知処理>
 制御部50は、パターン検出部53が音区間の非可聴音の成分が予め設定された条件を満たすと判定した場合、条件を満たす音が検出されたことをユーザに通知部42によって通知する。
<Notification processing>
When the pattern detection section 53 determines that the inaudible sound component of the sound section satisfies the preset condition, the control section 50 notifies the user through the notification section 42 that a sound satisfying the condition has been detected.
 一例として、制御部50は、設定パターンに対応するテキストと、条件を満たす音が検出された検出時刻とを、表示部43に表示させてよい。設定パターンに対応するテキストは、設定パターンの生成に用いられた可聴音を言語化したものである。例えば、設定パターンに対応するテキストが「512便」であり、検出時刻が11:11である場合、制御部50は、「512便 検出時刻11:11」との情報を表示部43に表示させる。 As an example, the control unit 50 may cause the display unit 43 to display the text corresponding to the set pattern and the detection time at which the sound satisfying the condition was detected. The text corresponding to the setting pattern is a verbalization of the audible sound used to generate the setting pattern. For example, if the text corresponding to the setting pattern is "Flight 512" and the detection time is 11:11, the control unit 50 causes the display unit 43 to display the information "Flight 512, detection time 11:11". .
 他の例として、制御部50は、振動部44によって処理装置4を信号させることにより、条件を満たす音が検出されたことをユーザに通知してもよい。さらに他の例として、制御部50は、発光部45を発光させることにより、条件を満たす音が検出されたことをユーザに通知してもよい。 As another example, the control unit 50 may notify the user that a sound satisfying the condition has been detected by causing the vibration unit 44 to signal the processing device 4. As yet another example, the control unit 50 may notify the user that a sound satisfying the condition has been detected by causing the light emitting unit 45 to emit light.
 制御部50は、ユーザに通知した後、再生指示を入力部41によって受け付けてよい。再生指示は、表示部43にテキスト及び検出時刻を表示させた場合であって、入力部41が表示部43のディスプレイと一体的に設けられたタッチスクリーンである場合、表示部43のテキスト及び検出時刻へのタッチ操作であってよい。制御部50は、再生指示を受け付けると、タッチ操作を受け付けたテキスト及び検出時刻に対応する区間IDと、再生トリガとを再生部54に出力する。再生部54は、区間ID及び再生トリガを取得すると、上述したように、再生処理を実行する。 After notifying the user, the control unit 50 may receive the reproduction instruction through the input unit 41. The reproduction instruction is when the text and detection time are displayed on the display section 43, and when the input section 41 is a touch screen provided integrally with the display of the display section 43, the reproduction instruction is given by displaying the text and the detection time on the display section 43. It may be a touch operation to the time. Upon receiving the playback instruction, the control unit 50 outputs the text for which the touch operation was received, the section ID corresponding to the detection time, and a playback trigger to the playback unit 54 . Upon acquiring the section ID and the reproduction trigger, the reproduction unit 54 executes the reproduction process as described above.
 (出力装置の動作)
 図6は、図2に示す出力装置2が実行する音出力処理の流れを示すフローチャートである。以下、ユーザは、テキストデータを入力部10から入力するものとする。ユーザがテキストデータを入力部10から入力すると、制御部18は、ステップS1の処理を開始する。
(Operation of output device)
FIG. 6 is a flowchart showing the flow of sound output processing executed by the output device 2 shown in FIG. Hereinafter, it is assumed that the user inputs text data from the input unit 10. When the user inputs text data from the input unit 10, the control unit 18 starts processing in step S1.
 制御部18は、テキストデータの入力を入力部10によって受け付ける(ステップS1)。制御部18は、変換部12によって、ステップS1の処理で受け付けたテキストデータを可聴音のデータに変換する(ステップS2)。制御部18は、スイッチ13によって、変換部12と、遅延バッファ14及び抽出部19とを電気的に接続させる。 The control unit 18 receives input of text data through the input unit 10 (step S1). The control unit 18 causes the conversion unit 12 to convert the text data received in the process of step S1 into audible sound data (step S2). The control unit 18 electrically connects the conversion unit 12, the delay buffer 14, and the extraction unit 19 using the switch 13.
 抽出部19は、スイッチ13から可聴音のデータを取得すると、取得した可聴音のデータをフレームに分割する(ステップS3)。抽出部19は、フレーム毎に、フレームに含まれる可聴音のサンプリングデータから基本周波数を抽出する(ステップS4)。 Upon acquiring the audible sound data from the switch 13, the extraction unit 19 divides the acquired audible sound data into frames (step S3). The extraction unit 19 extracts the fundamental frequency from the audible sound sampling data included in the frame for each frame (step S4).
 生成部20は、抽出部19からフレーム毎の基本周波数のデータを取得すると、フレーム毎に、基本周波数のデータに基づいて正弦波を生成する(ステップS5)。生成部20は、正弦波を生成すると、フレーム毎の正弦波を合成して非可聴音データを生成する(ステップS6)。 When the generation unit 20 acquires the fundamental frequency data for each frame from the extraction unit 19, it generates a sine wave for each frame based on the fundamental frequency data (step S5). After generating the sine wave, the generation unit 20 synthesizes the sine waves for each frame to generate inaudible sound data (step S6).
 制御部18は、重畳部15に、遅延バッファ14からの可聴音データと、生成部20からの非可聴音データとを重畳させ、音のデータを生成させる(ステップS7)。制御部18は、スピーカ31によって、ステップS7の処理で生成した音のデータを音に変換して出力する(ステップS8)。 The control unit 18 causes the superimposition unit 15 to superimpose the audible sound data from the delay buffer 14 and the inaudible sound data from the generation unit 20 to generate sound data (step S7). The control unit 18 converts the sound data generated in the process of step S7 into sound and outputs the sound using the speaker 31 (step S8).
 ステップS1の処理では、制御部18は、可聴音のデータの入力を入力部11によって受け付けてもよい。この場合、ステップS12の処理では、制御部18は、スイッチ13によって、入力部11と、遅延バッファ14及び抽出部19とを電気的に接続させる。 In the process of step S1, the control unit 18 may receive input of audible sound data through the input unit 11. In this case, in the process of step S12, the control section 18 electrically connects the input section 11, the delay buffer 14, and the extraction section 19 using the switch 13.
 (集音器の動作)
 図7は、図4に示す集音器3が実行する集音処理の流れを示すフローチャートである。例えば、制御部34は、処理装置4から集音処理の開始指示を通信部32によって受信すると、ステップS11の処理を開始する。
(Operation of sound collector)
FIG. 7 is a flowchart showing the flow of sound collection processing performed by the sound collector 3 shown in FIG. For example, when the communication unit 32 receives an instruction to start the sound collection process from the processing device 4, the control unit 34 starts the process of step S11.
 制御部34は、処理装置4からのリプレイフラグがTrueであるか否かを判定する(ステップS11)。制御部34は、リプレイフラグがTrueであると判定した場合(ステップS11:YES)、ステップS12の処理に進む。制御部34は、リプレイフラグがFalseであると判定した場合(ステップS11:NO)、ステップS13の処理に進む。 The control unit 34 determines whether the replay flag from the processing device 4 is True (step S11). When the control unit 34 determines that the replay flag is True (step S11: YES), the process proceeds to step S12. When the control unit 34 determines that the replay flag is False (step S11: NO), the process proceeds to step S13.
 ステップS12の処理では、制御部34は、再生モードで動作するように制御する。再生部36は、蓄積部37に蓄積された再生データをスピーカ31に出力させる。 In the process of step S12, the control unit 34 controls to operate in the playback mode. The playback section 36 causes the speaker 31 to output the playback data accumulated in the storage section 37 .
 ステップS13の処理では、制御部34は、スルーモードで動作するように制御する。再生部36は、取得部35から取得した外部音のデジタルデータをスピーカ31に出力させる。 In the process of step S13, the control unit 34 controls to operate in through mode. The reproduction unit 36 causes the speaker 31 to output the digital data of the external sound acquired from the acquisition unit 35.
 制御部34は、ステップS12又はステップS13の処理後、ステップS11の処理に戻る。 After the process of step S12 or step S13, the control unit 34 returns to the process of step S11.
 (処理装置の動作)
 図8及び図9は、図4に示す処理装置4が実行する音取得処理の流れを示すフローチャートである。例えば、制御部50は、集音器3から処理装置4への外部音のデジタルデータの送信が開始されると、ステップS21の処理を開始する。
(Operation of processing device)
8 and 9 are flowcharts showing the flow of the sound acquisition process executed by the processing device 4 shown in FIG. 4. For example, when the transmission of external sound digital data from the sound collector 3 to the processing device 4 is started, the control unit 50 starts the process of step S21.
 制御部50は、集音器3から外部音のデジタルデータを通信部40によって受信する(ステップS21)。制御部50は、フィルタ部47によって、受信した外部音のデジタルデータを、非可聴音のデータと、可聴音のデータとに分ける(ステップS22)。 The control unit 50 receives digital data of external sound from the sound collector 3 through the communication unit 40 (step S21). The control unit 50 uses the filter unit 47 to separate the received external sound digital data into inaudible sound data and audible sound data (step S22).
 区間検出部51は、フィルタ部47から取得した非可聴音のデータを、フレームに分割する(ステップS23)。区間検出部51は、フレームに雑音以外の音のデータが含まれるか否かを判定することによって、音区間を検出する(ステップS24)。 The section detection unit 51 divides the inaudible sound data acquired from the filter unit 47 into frames (step S23). The section detection unit 51 detects a sound section by determining whether the frame includes sound data other than noise (step S24).
 区間バッファ52は、区間検出部51から取得した音区間の情報に基づいて、フィルタ部47から取得した可聴音のデータのうちから、音区間に含まれる可聴音のデータを抽出する。区間バッファ52は、抽出した音区間の可聴音のデータを区間IDに対応付けて保持する(ステップS25)。 The section buffer 52 extracts audible sound data included in the sound section from among the audible sound data obtained from the filter section 47 based on the sound section information obtained from the section detection section 51. The section buffer 52 stores the audible sound data of the extracted sound section in association with the section ID (step S25).
 パターン検出部53は、ステップS24の処理で検出された音区間の非可聴音のデータを、フレームに分割する(ステップS26)。 The pattern detection unit 53 divides the inaudible sound data of the sound section detected in the process of step S24 into frames (step S26).
 パターン検出部53は、フレーム毎に含まれる音のデジタルデータを高速フーリエ変換し、フレーム毎の周波数スペクトル信号を取得する(ステップS27)。 The pattern detection unit 53 performs fast Fourier transform on the digital sound data included in each frame, and obtains a frequency spectrum signal for each frame (step S27).
 パターン検出部53は、音区間の非可聴音データの周波数スペクトル信号の少なくとも一部の形状が設定パターンと一致するか否かを判定する(ステップS28)。 The pattern detection unit 53 determines whether the shape of at least a portion of the frequency spectrum signal of the inaudible sound data in the sound interval matches the set pattern (step S28).
 パターン検出部53は、音区間の非可聴音の周波数スペクトル信号の少なくとも一部の形状が設定パターンと一致すると判定した場合(ステップS28:YES)、処理装置4がステップS29の処理に進む。一方、パターン検出部53が音区間の非可聴音の周波数スペクトル信号の少なくとも一部の形状が設定パターンと一致すると判定しない場合(ステップS28:NO)、処理装置4は、ステップS21の処理に戻る。 If the pattern detection unit 53 determines that the shape of at least a portion of the frequency spectrum signal of the inaudible sound in the sound interval matches the set pattern (step S28: YES), the processing device 4 proceeds to the process of step S29. On the other hand, if the pattern detection unit 53 does not determine that the shape of at least part of the frequency spectrum signal of the inaudible sound in the sound interval matches the set pattern (step S28: NO), the processing device 4 returns to the process of step S21. .
 ステップS29の処理では、再生部54は、パターン検出部53から再生トリガ及び区間IDを取得すると、リプレイフラグをTrueに設定する。再生部54は、区間バッファ52から区間IDに対応付けられた可聴音のデータを取得し、取得した可聴音のデータを再生データとして集音器3に通信部40によって送信する(ステップS30)。 In the process of step S29, upon acquiring the reproduction trigger and section ID from the pattern detection unit 53, the reproduction unit 54 sets the replay flag to True. The reproduction unit 54 acquires audible sound data associated with the interval ID from the interval buffer 52, and transmits the acquired audible sound data as reproduction data to the sound collector 3 by the communication unit 40 (step S30).
 再生部54は、全ての再生データを集音器3に送信したか否かを判定する(ステップS31)。再生部54は、全ての再生データを集音器3に送信したと判定した場合(ステップS31:YES)、リプレイフラグをFalseに設定する(ステップS32)。再生部54は、全ての再生データを集音器3に送信したと判定しない場合(ステップS31:NO)、ステップS30の処理に戻る。 The reproduction unit 54 determines whether all reproduction data has been transmitted to the sound collector 3 (step S31). When determining that all the reproduction data has been transmitted to the sound collector 3 (step S31: YES), the reproduction unit 54 sets the replay flag to False (step S32). If the reproduction unit 54 does not determine that all the reproduction data has been transmitted to the sound collector 3 (step S31: NO), the process returns to step S30.
 ステップS33の処理では、制御部50は、ユーザに、予め設定された条件を満たす音が検出されたことを通知部42によって通知する。 In the process of step S33, the control unit 50 uses the notification unit 42 to notify the user that a sound satisfying the preset conditions has been detected.
 制御部50は、再生指示を入力部41によって受け付けたか否かを判定する(ステップS34)。制御部50は、再生指示を入力部41によって受け付けたと判定した場合(ステップS34:YES)、図9に示すようなステップS35の処理に進む。一方、制御部50は、再生指示を入力部41によって受け付けたと判定しない場合(ステップS34:NO)、ステップS34の処理を繰り返し実行する。制御部50は、ステップS34の処理を繰り返し実行している間に所定時間が経過した場合、音取得処理を終了してもよい。所定時間は、処理装置4の仕様に基づいて設定されてよい。 The control unit 50 determines whether a reproduction instruction is received by the input unit 41 (step S34). When the control unit 50 determines that the reproduction instruction has been received by the input unit 41 (step S34: YES), the process proceeds to step S35 as shown in FIG. On the other hand, if the control unit 50 does not determine that the reproduction instruction has been received by the input unit 41 (step S34: NO), it repeatedly executes the process of step S34. The control unit 50 may end the sound acquisition process if a predetermined period of time has elapsed while repeatedly performing the process of step S34. The predetermined time may be set based on the specifications of the processing device 4.
 制御部50は、図9に示すようなステップS35,S36,S37,S38の処理を図8に示すようなステップS29,S30,S31,S32と同じ又は類似に実行する。制御部50は、ステップS38の処理後、ステップS21の処理に戻る。 The control unit 50 executes steps S35, S36, S37, and S38 as shown in FIG. 9 in the same manner as or similar to steps S29, S30, S31, and S32 as shown in FIG. After the process in step S38, the control unit 50 returns to the process in step S21.
 このように本実施形態では、出力装置2の制御部18は、出力装置2から出力するアナウンス音等の可聴音のテキストデータを入力部10から受け付ける。又は、制御部18は、出力装置2から出力するアナウンス音等である可聴音のデータを入力部11から受け付ける。制御部18は、受け付けた可聴音の情報を成分に含む非可聴音を生成し、スピーカ16によって可聴域の音と非可聴域の音とを重畳させた音を出力する。非可聴音は、一般的な人間には聞こえない。しかしながら、一部の人間は、非可聴音を聞くことができ、非可聴音を不快に感じることがある。本実施形態では、可聴音と非可聴音とを重畳させた音をスピーカ16から出力することにより、非可聴音が聞こえる人間が非可聴音を不快に感じる可能性を低減することができる。 As described above, in the present embodiment, the control unit 18 of the output device 2 receives text data of audible sounds such as announcement sounds output from the output device 2 from the input unit 10. Alternatively, the control unit 18 receives data of an audible sound such as an announcement sound output from the output device 2 from the input unit 11 . The control unit 18 generates an inaudible sound containing the received audible sound information as a component, and the speaker 16 outputs a sound in which the audible range sound and the inaudible range sound are superimposed. Inaudible sounds cannot be heard by the average person. However, some people can hear inaudible sounds and may find them unpleasant. In this embodiment, by outputting a sound in which an audible sound and a non-audible sound are superimposed from the speaker 16, it is possible to reduce the possibility that a person who can hear the non-audible sound will find the non-audible sound unpleasant.
 また、本実施形態では、処理装置4の制御部50は、集音器3を介して外部音のデータを通信部40によって受信することにより取得する。外部音のデータには、出力装置2が出力したアナウンス音のデータが含まれ得る。制御部50は、外部音の非可聴音の成分によって、外部音のうち、出力装置2が出力したアナウンス音等の可聴音を検出する。ここで、非可聴音は、可聴音と比較すると、自然界には少ない音である。つまり、騒音の多くは、可聴音である。したがって、外部音に騒音が含まれる場合でも、制御部50は、外部音の非可聴音の成分によって、出力装置2が出力した可聴音を精度良く検出することができる。 Furthermore, in the present embodiment, the control unit 50 of the processing device 4 acquires external sound data by having the communication unit 40 receive the external sound data via the sound collector 3 . The external sound data may include announcement sound data output by the output device 2. The control unit 50 detects an audible sound such as an announcement sound outputted by the output device 2 from among the external sounds based on the inaudible sound component of the external sound. Here, inaudible sounds are rare in nature compared to audible sounds. In other words, most of the noise is audible. Therefore, even if the external sound includes noise, the control unit 50 can accurately detect the audible sound output by the output device 2 based on the inaudible sound component of the external sound.
 よって、本実施形態によれば、音を精度良く検出するための技術を提供することができる。 Therefore, according to the present embodiment, it is possible to provide a technique for detecting sound with high accuracy.
 さらに、本実施形態では、処理装置4の制御部50は、外部音の非可聴音の成分が予め設定された条件を満たす場合、ユーザに条件を満たす音が検出されたことを通知してもよい。このような構成により、ユーザは、条件を満たす音が検出されたことを知ることができる。 Furthermore, in the present embodiment, if the inaudible sound component of the external sound satisfies a preset condition, the control unit 50 of the processing device 4 may notify the user that a sound satisfying the condition has been detected. good. With such a configuration, the user can know that a sound satisfying the condition has been detected.
 また、本実施形態では、処理装置4の制御部50は、条件を満たす音が検出されたことをユーザに通知した後、再生指示を入力部41によって受け付けると、音区間の可聴域の音のデータを再生する再生処理を実行してもよい。例えば、再生部54は、音区間の可聴域の音のデータを再生データとして集音器3に送信し、集音器3によって再生させてもよい。ここで、ユーザは、空港の待合室又は駅構内等において、出力装置2が出力するアナウンス音に注意を払っていることが少ない。音区間の可聴域の音のデータを再生することにより、ユーザは、聞き逃したアナウンス音を再度聞くことができる。 In the present embodiment, when the input unit 41 receives a playback instruction after notifying the user that a sound satisfying the condition has been detected, the control unit 50 of the processing device 4 detects a sound in the audible range of the sound interval. A playback process for playing back the data may also be executed. For example, the playback unit 54 may transmit the data of the audible range sound in the sound section to the sound collector 3 as playback data, and the sound collector 3 may play the data. Here, users rarely pay attention to the announcement sound output by the output device 2 in an airport waiting room, a station premises, or the like. By reproducing the sound data in the audible range of the sound section, the user can listen again to the missed announcement sound.
 また、本実施形態では、処理装置4の制御部50は、外部音の非可聴音の成分が予め設定された条件を満たす場合、外部音の可聴域の音を再生する再生処理を実行してもよい。例えば、再生部54は、パターン検出部53から再生トリガ及び区間IDを取得すると、区間バッファ52から区間IDに対応する可聴音のデータを取得してよい。再生部54は、取得した可聴音のデータを集音器3に送信して集音器3によって再生させてもよい。このような構成により、ユーザは、速やかに音を聞くことができる。 Furthermore, in the present embodiment, the control unit 50 of the processing device 4 executes a reproduction process for reproducing the audible range of the external sound when the inaudible sound component of the external sound satisfies a preset condition. Good too. For example, upon acquiring the reproduction trigger and section ID from the pattern detection section 53, the reproduction section 54 may acquire audible sound data corresponding to the section ID from the section buffer 52. The reproduction unit 54 may transmit the acquired audible sound data to the sound collector 3 and have the sound collector 3 reproduce it. With such a configuration, the user can quickly hear the sound.
 (他の実施形態)
 以下、他の実施形態に係る処理システムについて説明する。他の実施形態に係る処理システムは、図2に示すような出力装置2と、図4に示すような集音器3と、後述の図10に示すような処理装置104とを含む。
(Other embodiments)
Processing systems according to other embodiments will be described below. A processing system according to another embodiment includes an output device 2 as shown in FIG. 2, a sound collector 3 as shown in FIG. 4, and a processing device 104 as shown in FIG. 10, which will be described later.
 (出力装置の構成)
 他の実施形態に係る出力装置2について、図2を参照して説明する。
(Configuration of output device)
An output device 2 according to another embodiment will be described with reference to FIG. 2.
 他の実施形態に係る生成部20は、上述した実施形態と同じ又は類似に、フレーム毎に、基本周波数のデータに基づいて正弦波を生成する。ただし、他の実施形態では、生成部20は、例えば、式(2)によって、時刻tにおける正弦波x(t)を生成する。式(2)によって、基本周波数fを成分とする正弦波x(t)を生成することができる。
   x(t)=Asin{2πt・(n・f+X)}   式(2)
 式(2)において、数値nは、1.0以上の数値である。数値nは、(n・f+X)が集音器3のサンプリングレートを超えない範囲に設定される。数値nは、処理装置104が実行する高速フーリエ変換の分解能に基づいて設定されてもよい。例えば、高速フーリエ変換の分解能が低いほど、数値nは、大きく設定されてよい。
The generation unit 20 according to another embodiment generates a sine wave based on fundamental frequency data for each frame, the same as or similar to the above-described embodiment. However, in other embodiments, the generation unit 20 generates the sine wave x(t) at time t using equation (2), for example. A sine wave x(t) having the fundamental frequency f 0 as a component can be generated by equation (2).
x(t)=A sin {2πt・(n・f 0 +X)} Formula (2)
In formula (2), the numerical value n is a numerical value of 1.0 or more. The numerical value n is set within a range in which (n·f 0 +X) does not exceed the sampling rate of the sound collector 3. The numerical value n may be set based on the resolution of the fast Fourier transform performed by the processing device 104. For example, the lower the resolution of the fast Fourier transform, the larger the numerical value n may be set.
 生成部20は、フレーム毎に正弦波を生成すると、上述した実施形態と同じ又は類似に、フレーム毎の正弦波を合成して非可聴音データを生成する。生成部20は、非可聴音のデータを生成すると、上述した実施形態と同じ又は類似に、生成した非可聴音のデータを重畳部15に出力する。 After generating a sine wave for each frame, the generation unit 20 synthesizes the sine waves for each frame to generate inaudible sound data in the same manner as or similar to the embodiment described above. When generating the inaudible sound data, the generating unit 20 outputs the generated inaudible sound data to the superimposing unit 15 in the same manner as or similar to the embodiment described above.
 制御部18は、上述した実施形態と同じ又は類似に、重畳部15に、遅延バッファ14からの可聴音のデータと生成部20からの非可聴音のデータとを重畳させ、音のデータを生成させる。ただし、他の実施形態では、制御部18は、可聴音のデータと非可聴音のデータとを重畳させるとき、可聴音のデータと非可聴音のデータとの間の位相ずれを位相調整により修正してもよい。また、出力装置2は、可聴音のデータと非可聴音のデータとを重畳させるときに生じた可聴音域のノイズを除去するためのハイパスフィルタ(HPF:High Pass Filter)をさらに備えてもよい。 The control unit 18 causes the superimposition unit 15 to superimpose the audible sound data from the delay buffer 14 and the inaudible sound data from the generation unit 20 to generate sound data, in the same way or similar to the embodiment described above. let However, in other embodiments, when superimposing the audible sound data and the inaudible sound data, the control unit 18 corrects the phase shift between the audible sound data and the inaudible sound data by phase adjustment. You may. Further, the output device 2 may further include a high pass filter (HPF) for removing noise in the audible range that occurs when audible sound data and inaudible sound data are superimposed.
 制御部18は、上述した実施形態と同じ又は類似に、スピーカ16に、重畳部15からの音のデータを音に変換して出力させる。 The control unit 18 causes the speaker 16 to convert the sound data from the superimposition unit 15 into sound and output it, in the same way or similar to the embodiment described above.
 他の実施形態に係る出力装置2のその他の構成は、上述した実施形態に係る出力装置2の構成と同じ又は類似である。 The other configurations of the output device 2 according to the other embodiments are the same as or similar to the configuration of the output device 2 according to the embodiment described above.
 (処理装置の構成)
 図10に示すように、他の実施形態に係る処理装置104は、通信部40と、入力部41と、通知部42と、記憶部46と、フィルタ部47と、制御部150とを備える。
(Configuration of processing device)
As shown in FIG. 10, a processing device 104 according to another embodiment includes a communication section 40, an input section 41, a notification section 42, a storage section 46, a filter section 47, and a control section 150.
 他の実施形態に係る記憶部46には、ユーザによって予め設定されたキーワードが記憶される。ユーザは、聞き逃したくない情報に関するキーワードを設定する。例えば、ユーザは、「153便」に関する情報を聞き逃したくない場合、「153便」とのキーワードを設定して記憶部46に記憶させる。 The storage unit 46 according to another embodiment stores keywords set in advance by the user. The user sets keywords related to information that the user does not want to miss. For example, if the user does not want to miss information regarding "Flight 153," the user sets a keyword "Flight 153" and stores it in the storage unit 46.
 他の実施形態に係る記憶部46には、式(2)の定数X及び整数nの情報が記憶される。 The storage unit 46 according to another embodiment stores information about the constant X and the integer n in equation (2).
 制御部150は、少なくとも1つのプロセッサ、少なくとも1つの専用回路又はこれらの組み合わせを含んで構成される。プロセッサは、CPU若しくはGPU等の汎用プロセッサ又は特定の処理に特化した専用プロセッサである。専用回路は、例えば、FPGA又はASIC等である。制御部150は、処理装置104の各部を制御しながら処理装置4の動作に関わる処理を実行する。 The control unit 150 is configured to include at least one processor, at least one dedicated circuit, or a combination thereof. The processor is a general-purpose processor such as a CPU or GPU, or a dedicated processor specialized for specific processing. The dedicated circuit is, for example, an FPGA or an ASIC. The control unit 150 executes processing related to the operation of the processing device 4 while controlling each part of the processing device 104 .
 制御部150は、区間検出部51と、区間バッファ52と、再生部54と、バッファ55と、第1抽出部56と、第2抽出部57と、除去部58と、強調部59と、変換部60と、認識部61とを有する。バッファ55は、記憶部46の一部であってよい。バッファ55の動作は、制御部150のプロセッサ等により実行される。 The control unit 150 includes a section detection section 51, a section buffer 52, a playback section 54, a buffer 55, a first extraction section 56, a second extraction section 57, a removal section 58, an emphasis section 59, and a conversion section 58. It has a section 60 and a recognition section 61. Buffer 55 may be part of storage unit 46 . The operation of the buffer 55 is executed by the processor of the control unit 150 or the like.
 制御部150は、上述した実施形態と同じ又は類似に、集音器3から外部音のデータを通信部40によって受信する。制御部150は、受信した外部音のデータをフィルタ部47に入力する。フィルタ部47は、上述した実施形態と同じ又は類似に、外部音のデータを、非可聴音のデータと、可聴音のデータとに分ける。フィルタ部47は、非可聴音のデータを区間検出部51に出力する。フィルタ部47は、可聴音のデータを、区間バッファ52及びバッファ55に出力する。 The control unit 150 receives external sound data from the sound collector 3 through the communication unit 40 in the same manner as or similar to the embodiment described above. The control unit 150 inputs the received external sound data to the filter unit 47 . The filter unit 47 separates the external sound data into inaudible sound data and audible sound data in the same manner as or similar to the embodiment described above. The filter section 47 outputs the inaudible sound data to the section detection section 51. The filter section 47 outputs the audible sound data to the section buffer 52 and the buffer 55.
 区間検出部51は、上述した実施形態と同じ又は類似に、フィルタ部47から、非可聴音のデータを取得する。区間検出部51は、上述した実施形態と同じ又は類似に、非可聴音のデータに基づいて、可聴音のデータにおいて音が続く音区間を検出する。区間検出部51は、上述した実施形態と同じ又は類似に、音区間を検出すると、区間IDを生成する。区間検出部51は、上述した実施形態と同じ又は類似に、音区間の情報及び区間IDを、区間バッファ52に出力する。他の実施形態では、区間検出部51は、音区間の情報及び区間IDをバッファ55に出力する。 The section detection unit 51 acquires inaudible sound data from the filter unit 47 in the same manner as or similar to the embodiment described above. The section detection unit 51 detects a sound section in which a sound continues in the audible sound data, based on the inaudible sound data, as in or similar to the embodiment described above. The section detection unit 51 generates a section ID when detecting a sound section, in the same way or similar to the embodiment described above. The section detection unit 51 outputs sound section information and section ID to the section buffer 52 in the same manner as or similar to the embodiment described above. In another embodiment, the section detection unit 51 outputs the information on the sound section and the section ID to the buffer 55.
 他の実施形態では、区間検出部51は、フィルタ部47から取得した非可聴音のデータのうちから、音区間の非可聴音のデータを抽出する。区間検出部51は、抽出した音区間の非可聴音のデータ及び区間IDを第1抽出部56に出力する。 In another embodiment, the section detecting section 51 extracts inaudible sound data in a sound section from among the inaudible sound data acquired from the filter section 47. The section detection section 51 outputs the inaudible sound data and section ID of the extracted sound section to the first extraction section 56 .
 区間検出部51は、フィルタ部47から左用の非可聴音のデータ及び右用の非可聴音のデータを取得してよい。この場合、区間検出部51は、左用の非可聴音のデータから音区間の左用の非可聴音のデータを抽出し、右用の非可聴音のデータから音区間の右用の非可聴音のデータを抽出してよい。区間検出部51は、抽出した音区間の左用及び右用の非可聴音のデータと、区間IDとを第1抽出部56に出力してよい。 The section detection unit 51 may acquire left inaudible sound data and right inaudible sound data from the filter unit 47. In this case, the section detection unit 51 extracts the inaudible sound data for the left side of the sound section from the data of the inaudible sound for the left side, and extracts the inaudible sound data for the right side of the sound section from the data of the right inaudible sound. May extract data. The section detecting section 51 may output the left and right inaudible sound data and section ID of the extracted sound section to the first extracting section 56 .
 区間バッファ52は、上述した実施形態と同じ又は類似に、フィルタ部47から可聴音のデータを取得し、区間検出部51から音区間の情報及び区間IDを取得する。上述した実施形態と同じ又は類似に、区間バッファ52は、区間検出部51から取得した音区間の情報に基づいて、フィルタ部47から取得した可聴音のデータのうちから、音区間に含まれる可聴音のデータを抽出する。区間バッファ52は、抽出した音区間の可聴音のデータを区間IDに対応付けて保持する。 The section buffer 52 acquires audible sound data from the filter section 47 and obtains sound section information and section ID from the section detection section 51, in the same way or similar to the embodiment described above. Same as or similar to the embodiment described above, the section buffer 52 selects data that may be included in the sound section from among the audible sound data obtained from the filter section 47 based on the information on the sound section obtained from the section detection section 51. Extract auditory data. The section buffer 52 stores audible sound data of the extracted sound section in association with the section ID.
 他の実施形態では、区間バッファ52は、抽出した音区間に含まれる可聴音のデータ及び区間IDを除去部58に出力する。区間バッファ52は、音区間の左用及び右用の可聴音のデータを抽出した場合、音区間の左用及び右用の可聴音のデータと、区間IDとを除去部58に出力してもよい。 In another embodiment, the section buffer 52 outputs the audible sound data and section ID included in the extracted sound section to the removal unit 58. When the section buffer 52 extracts the left and right audible sound data of the sound section, it may output the left and right audible sound data of the sound section and the section ID to the removal unit 58 .
 バッファ55は、フィルタ部47から可聴音のデータを取得する。また、バッファ55は、区間検出部51から音区間の情報及び区間IDを取得する。バッファ55は、区間検出部51から取得した音区間の情報に基づいて、フィルタ部47から取得した可聴音のデータのうちから、音区間以外の区間に含まれる可聴音のデータを抽出する。バッファ55は、抽出した音区間以外の区間に含まれる可聴音のデータを保持する。 The buffer 55 acquires audible sound data from the filter section 47. The buffer 55 also acquires information on the sound interval and the interval ID from the interval detection unit 51. The buffer 55 extracts audible sound data included in sections other than the sound section from the audible sound data obtained from the filter section 47 based on the sound section information obtained from the section detection section 51. The buffer 55 holds data of audible sounds included in intervals other than the extracted sound interval.
 バッファ55は、抽出した音区間以外の区間に含まれる可聴音のデータ及び区間IDを第2抽出部57に出力する。 The buffer 55 outputs the audible sound data and the section ID included in sections other than the extracted sound section to the second extraction unit 57.
 バッファ55は、フィルタ部47から、左用及び右用の可聴音のデータを取得してもよい。この場合、バッファ55は、音区間以外の区間に含まれる左用及び右用の可聴音のデータを抽出してよい。バッファ55は、音区間以外の区間に含まれる左用及び右用の可聴音のデータと、区間IDとを第2抽出部57に出力してもよい。 The buffer 55 may acquire left and right audible sound data from the filter unit 47. In this case, the buffer 55 may extract left and right audible sound data included in intervals other than the sound interval. The buffer 55 may output data of left and right audible sounds included in a section other than the sound section and the section ID to the second extraction section 57.
 第1抽出部56は、区間検出部51から、音区間の非可聴音のデータ及び区間IDを取得する。第1抽出部56は、取得した音区間の非可聴音のデータから、当該音区間の可聴音の基本周波数を抽出する。上述したように、出力装置2は、式(2)によって可聴音の基本周波数の情報を含む非可聴音のデータを生成する。そのため、音区間の非可聴のデータには、当該音区間の基本周波数の情報が含まれる。以下、可聴音の基本周波数を抽出する処理の一例について説明する。 The first extraction unit 56 acquires the inaudible sound data and the interval ID of the sound interval from the interval detection unit 51. The first extraction unit 56 extracts the fundamental frequency of the audible sound in the sound interval from the acquired data on the inaudible sound in the sound interval. As described above, the output device 2 generates inaudible sound data including information on the fundamental frequency of the audible sound using equation (2). Therefore, the inaudible data of a sound interval includes information about the fundamental frequency of the sound interval. An example of a process for extracting the fundamental frequency of an audible sound will be described below.
 第1抽出部56は、図3を参照した上述した処理と同じ又は類似に、非可聴音のデータを、所定のフレーム長を有するフレームに分割する。第1抽出部56は、非可聴音のデータをフレームに分割すると、区間検出部51の処理と同じ又は類似に、フレーム毎に含まれる音のデータを高速フーリエ変換し、フレーム毎の周波数スペクトル信号を取得する。例えば、第1抽出部56は、数百サンプル毎に、数千サンプルの音のデータを高速フーリエ変換する。 The first extraction unit 56 divides the inaudible sound data into frames having a predetermined frame length in the same way or similar to the process described above with reference to FIG. 3. When the inaudible sound data is divided into frames, the first extraction unit 56 performs a fast Fourier transform on the sound data included in each frame in the same or similar manner as the process of the section detection unit 51, and generates a frequency spectrum signal for each frame. get. For example, the first extraction unit 56 performs fast Fourier transform on several thousand samples of sound data every several hundred samples.
 第1抽出部56は、フレーム毎の周波数スペクトル信号を取得すると、例えば周波数スペクトル信号を二乗することにより、フレーム毎の周波数スペクトル信号のパワーを算出する。第1抽出部56は、フレーム毎の周波数スペクトルを信号のうち、周波数スペクトル信号のパワーがパワー閾値以上である周波数スペクトル信号を抽出する。パワー閾値は、区間検出部51が用いるパワー閾値と同じであってもよいし、区間検出部51が用いるパワー閾値とは別に設定されてもよい。 Upon acquiring the frequency spectrum signal for each frame, the first extraction unit 56 calculates the power of the frequency spectrum signal for each frame, for example, by squaring the frequency spectrum signal. The first extraction unit 56 extracts, from among the frequency spectrum signals for each frame, a frequency spectrum signal in which the power of the frequency spectrum signal is greater than or equal to a power threshold. The power threshold value may be the same as the power threshold value used by the section detection section 51, or may be set separately from the power threshold value used by the section detection section 51.
 第1抽出部56は、周波数スペクトル信号のパワーがパワー閾値以上である周波数スペクトル信号を抽出すると、抽出した周波数スペクトル信号から、基本周波数の情報を抽出する。例えば、第1抽出部56は、抽出した周波数スペクトル信号と、記憶部46に記憶された式(2)の定数X及び整数nの情報とによって、式(2)の基本周波数fを抽出する。このように第1抽出部56が抽出した基本周波数は、出力装置2が出力した可聴音の基本周波数となる。 When the first extraction unit 56 extracts a frequency spectrum signal whose power is equal to or greater than the power threshold, the first extraction unit 56 extracts fundamental frequency information from the extracted frequency spectrum signal. For example, the first extracting unit 56 extracts the fundamental frequency f 0 in equation (2) using the extracted frequency spectrum signal and information on the constant X and integer n in equation (2) stored in the storage unit 46. . The fundamental frequency extracted by the first extraction unit 56 in this manner becomes the fundamental frequency of the audible sound output by the output device 2.
 第1抽出部56は、抽出した基本周波数の情報及び区間IDを強調部59に出力する。 The first extracting unit 56 outputs the extracted fundamental frequency information and section ID to the emphasizing unit 59.
 第1抽出部56は、区間検出部51から、音区間の左用及び右用の非可聴音のデータと、区間IDとを取得してもよい。この場合、第1抽出部56は、音区間の左用及び右用の非可聴音のデータの何れか一方から、基本周波数の情報を抽出してよい。 The first extraction unit 56 may acquire the left and right inaudible sound data and the interval ID from the interval detection unit 51. In this case, the first extraction unit 56 may extract fundamental frequency information from either the left or right inaudible sound data of the sound section.
 第2抽出部57は、バッファ55から、音区間以外の区間に含まれる可聴音のデータ及び区間IDを取得する。第2抽出部57は、取得した音区間以外の区間に含まれる可聴音のデータに基づいて、雑音成分の情報を抽出する。ここで、音区間以外の区間に含まれる可聴音のデータが、出力装置2から出力された可聴音のデータである可能性は低い。むしろ、音区間以外の区間に含まれる可聴音のデータは、雑音のデータである可能性が高い。そこで、第2抽出部57は、音区間以外の区間に含まれる可聴音のデータに基づいて雑音成分の情報を抽出する。以下、雑音を抽出する処理の一例について説明する。 The second extraction unit 57 acquires the audible sound data and the section ID included in sections other than the sound section from the buffer 55. The second extraction unit 57 extracts noise component information based on audible sound data included in an interval other than the acquired sound interval. Here, the possibility that the audible sound data included in the interval other than the sound interval is the audible sound data output from the output device 2 is low. Rather, the audible sound data included in intervals other than the sound interval is highly likely to be noise data. Therefore, the second extraction unit 57 extracts noise component information based on audible sound data included in intervals other than the sound interval. An example of a process for extracting noise will be described below.
 第2抽出部57は、図3を参照した上述した処理と同じ又は類似に、音区間以外の区間に含まれる可聴音のデータを、所定のフレーム長を有するフレームに分割する。第2抽出部57は、可聴音のデータをフレームに分割すると、区間検出部51の処理と同じ又は類似に、フレーム毎に含まれる音のデータを高速フーリエ変換し、フレーム毎の周波数スペクトル信号を取得する。例えば、第2抽出部57は、数百サンプル毎に、数千サンプルの音のデータを高速フーリエ変換する。 The second extraction unit 57 divides the audible sound data included in a section other than the sound section into frames having a predetermined frame length, in the same way or similar to the process described above with reference to FIG. 3. When the audible sound data is divided into frames, the second extraction unit 57 performs a fast Fourier transform on the sound data included in each frame in the same or similar manner as the processing by the section detection unit 51, and generates a frequency spectrum signal for each frame. get. For example, the second extraction unit 57 performs fast Fourier transform on several thousand samples of sound data every several hundred samples.
 第2抽出部57は、フレーム毎の周波数スペクトル信号を雑音の周波数スペクトル信号として取得する。つまり、第2抽出部57は、雑音成分の情報として、雑音の周波数スペクトル信号を取得する。第2抽出部57は、雑音の周波数スペクトル信号及び区間IDを、除去部58に出力する。 The second extraction unit 57 acquires the frequency spectrum signal for each frame as a noise frequency spectrum signal. In other words, the second extraction unit 57 acquires the frequency spectrum signal of the noise as information on the noise component. The second extractor 57 outputs the noise frequency spectrum signal and the section ID to the remover 58.
 第2抽出部57は、バッファ55から、音区間以外の区間に含まれる左用及び右用の可聴音のデータと、区間IDとを取得してよい。この場合、第2抽出部57は、音区間以外の区間に含まれる左用の可聴音のデータから左用の雑音成分の情報を抽出し、音区間以外の区間に含まれる右用の可聴音のデータから右用の雑音成分の情報を抽出してもよい。第2抽出部57は、抽出した左用及び右用の雑音成分の情報例えば雑音の周波数スペクトル信号と、区間IDとを、除去部58に出力してもよい。 The second extraction unit 57 may acquire, from the buffer 55, the data of left and right audible sounds included in a section other than the sound section, and the section ID. In this case, the second extraction unit 57 extracts the information on the left noise component from the left audible sound data included in the section other than the sound section, and extracts the information on the left audible sound component from the data of the right audible sound included in the section other than the sound section. Information on the noise component for the right side may be extracted from . The second extractor 57 may output the extracted left and right noise component information, for example, the noise frequency spectrum signal and the section ID, to the removal unit 58.
 除去部58は、第2抽出部57から、雑音成分の情報として、雑音の周波数スペクトル信号及び区間IDを取得する。除去部58は、区間バッファ52から、区間ID及び可聴音のデータを取得する。除去部58は、雑音成分の情報に基づいて、区間バッファ52から取得した可聴音のデータから、雑音成分を除去する。以下、雑音成分を除去する処理の一例について説明する。 The removal unit 58 acquires the noise frequency spectrum signal and the section ID from the second extraction unit 57 as noise component information. The removal unit 58 acquires the section ID and audible sound data from the section buffer 52. The removal unit 58 removes the noise component from the audible sound data acquired from the section buffer 52 based on the information on the noise component. An example of processing for removing noise components will be described below.
 除去部58は、図3を参照した上述した処理と同じ又は類似に、音区間の可聴音のデータを、所定のフレーム長を有するフレームに分割する。除去部58は、可聴音のデータをフレームに分割すると、区間検出部51の処理と同じ又は類似に、フレーム毎に含まれる音のデータを高速フーリエ変換し、フレーム毎の周波数スペクトル信号を取得する。例えば、除去部58は、数百サンプル毎に、数千サンプルの音のデータを高速フーリエ変換する。 The removing unit 58 divides the audible sound data of the sound section into frames having a predetermined frame length in the same way or similar to the process described above with reference to FIG. 3. When the audible sound data is divided into frames, the removal unit 58 performs a fast Fourier transform on the sound data included in each frame in the same or similar manner as the process performed by the section detection unit 51, and obtains a frequency spectrum signal for each frame. . For example, the removing unit 58 performs fast Fourier transform on several thousand samples of sound data every several hundred samples.
 除去部58は、第2抽出部57から取得した雑音の周波数スペクトル信号に基づいて、フレーム毎の可聴音の周波数スペクトル信号から雑音成分を除去する。除去部58は、スペクトルサブトラクション法(Spectral subtraction method)又はウィーナーフィルタ(Wiener filter)等の任意の方法によって、フレーム毎の可聴音の周波数スペクトル信号から雑音成分を除去してよい。 The removal unit 58 removes noise components from the frequency spectrum signal of the audible sound for each frame based on the frequency spectrum signal of the noise acquired from the second extraction unit 57. The removal unit 58 may remove noise components from the frequency spectrum signal of the audible sound for each frame using any method such as a spectral subtraction method or a Wiener filter.
 除去部58は、雑音成分が除去された可聴音の周波数スペクトル信号及び区間IDを、強調部59に出力する。 The removing unit 58 outputs the frequency spectrum signal of the audible sound from which the noise component has been removed and the section ID to the emphasizing unit 59.
 除去部58は、区間バッファ52から、音区間の左用及び右用の可聴音のデータと、区間IDとを取得してもよい。除去部58は、第2抽出部57から、左用及び右用の雑音成分の情報と、区間IDとを、取得してもよい。この場合、除去部58は、音区間の左用の可聴音のデータから左用の雑音成分を除去し、音区間の右用の可聴音のデータから右用の雑音成分を除去してよい。除去部58は、雑音成分が除去された左用及び右用の可聴音のデータ例えば周波数スペクトル信号と、区間IDとを、強調部59に出力してよい。 The removing unit 58 may obtain left and right audible sound data and the section ID of the sound section from the section buffer 52. The removal unit 58 may acquire information on the left and right noise components and the section ID from the second extraction unit 57. In this case, the removal unit 58 may remove the left noise component from the left audible sound data of the sound section, and may remove the right noise component from the right audible sound data of the sound section. The removal unit 58 may output left and right audible sound data from which noise components have been removed, such as frequency spectrum signals, and the section ID to the emphasizing unit 59.
 強調部59は、除去部58から、雑音成分が除去された可聴音の周波数スペクトル信号及び区間IDを取得する。強調部59は、第1抽出部56から、基本周波数の情報及び区間IDを取得する。強調部59は、第1抽出部56から取得した基本周波数の情報に基づいて、除去部58から取得した可聴音の周波数スペクトル信号のうち、出力装置2が出力した可聴音の周波数スペクトル信号を強調する。ここで、一般的な人の喉の声帯から発せられる声帯音源の周波数は、基本周波数となる。また、一般的な人の声は、この声帯音源の基本周波数をベースに声道を声道フィルタとするスペクトル包絡によって生成される。つまり、一般的な人の声は、基本周波数の整数倍すなわち倍音と声道フィルタとの畳み込みにより生成される。そこで、強調部59は、除去部58から取得した可聴音の周波数スペクトル信号のうち、第1抽出部56から取得した基本周波数の整数倍となる周波数のパワー[dB]を強める。上述したように、第1抽出部56から取得した基本周波数は、出力装置2が出力する音の基本周波数となる。このように基本周波数の整数倍となる周波数のパワー[dB]を強めることにより、出力装置2が出力した可聴音の周波数スペクトル信号を強調することができる。 The emphasizing unit 59 acquires the frequency spectrum signal of the audible sound from which the noise component has been removed and the section ID from the removing unit 58. The emphasizing unit 59 acquires the fundamental frequency information and the section ID from the first extracting unit 56 . The emphasizing unit 59 emphasizes the frequency spectrum signal of the audible sound outputted by the output device 2 from among the frequency spectrum signal of the audible sound acquired from the removing unit 58 based on the fundamental frequency information acquired from the first extracting unit 56. do. Here, the frequency of the vocal cord sound source emitted from the vocal cords in the throat of a typical person is the fundamental frequency. Furthermore, a typical human voice is generated by a spectral envelope using the vocal tract as a vocal tract filter based on the fundamental frequency of the vocal cord sound source. That is, a typical human voice is generated by convolution of an integer multiple of the fundamental frequency, that is, overtones, and a vocal tract filter. Therefore, the emphasizing unit 59 increases the power [dB] of a frequency that is an integral multiple of the fundamental frequency acquired from the first extracting unit 56 out of the frequency spectrum signal of the audible sound acquired from the removing unit 58. As described above, the fundamental frequency acquired from the first extractor 56 becomes the fundamental frequency of the sound output by the output device 2. By increasing the power [dB] of frequencies that are integral multiples of the fundamental frequency in this way, the frequency spectrum signal of the audible sound output by the output device 2 can be emphasized.
 強調する処理の一例として、強調部59は、除去部58から取得した可聴音の周波数スペクトル信号のうち、図11に示すように(m×f±F)の周波数範囲のパワー[dB]を増加させる。図11では、横軸は、周波数を示す。縦軸は、周波数スペクトル信号のパワーゲインを示す。整数mは、1≦m≦Mを満たす整数である。整数mの上限値である整数Mは、処理システム1が使用される環境等に応じて設定されてよい。基本周波数fは、第1抽出部56が抽出した可聴音の基本周波数である。定数Fは、出力装置2から出力される可聴音の周波数のゆらぎ等に基づいて設定される。図11では、周波数(m×f)におけるパワーゲインは、B倍(Bは、1<Bを満たす)に設定される。周波数(m×f±F)のそれぞれにおけるパワーゲインは、1倍に設定される。周波数(m×f)から周波数(m×f±F)のそれぞれに向けて、パワーゲインが線形的に減衰するようにパワーゲインが設定される。 As an example of the emphasizing process, the emphasizing unit 59 calculates the power [dB] in the frequency range of (m×f 0 ±F) as shown in FIG. increase. In FIG. 11, the horizontal axis indicates frequency. The vertical axis indicates the power gain of the frequency spectrum signal. The integer m is an integer satisfying 1≦m≦M. The integer M, which is the upper limit of the integer m, may be set depending on the environment in which the processing system 1 is used. The fundamental frequency f 0 is the fundamental frequency of the audible sound extracted by the first extraction unit 56. The constant F is set based on the frequency fluctuation of the audible sound output from the output device 2, etc. In FIG. 11, the power gain at the frequency (m×f 0 ) is set to B times (B satisfies 1<B). The power gain at each frequency (m×f 0 ±F) is set to 1 times. The power gain is set so that the power gain linearly attenuates from frequency (m×f 0 ) to frequency (m×f 0 ±F).
 強調部59は、出力装置2が出力した可聴音の周波数スペクトル信号を強調すると、強調後の可聴音の周波数スペクトル信号及び区間IDを変換部60に出力する。 After emphasizing the frequency spectrum signal of the audible sound outputted by the output device 2, the emphasizing unit 59 outputs the frequency spectrum signal of the audible sound after the emphasis and the section ID to the converting unit 60.
 強調部59は、除去部58から、雑音成分が除去された左用及び右用の可聴音の周波数スペクトル信号と、区間IDとを取得してもよい。この場合、強調部59は、左用及び右用の可聴音の周波数スペクトル信号のそれぞれを強調してよい。強調部59は、強調後の左用及び右用の可聴音の周波数スペクトル信号と、区間IDとを変換部60に出力してよい。 The emphasizing unit 59 may acquire the frequency spectrum signals of the left and right audible sounds from which noise components have been removed and the section ID from the removing unit 58. In this case, the emphasizing unit 59 may emphasize each of the left and right audible sound frequency spectrum signals. The emphasizing section 59 may output the emphasized left and right audible sound frequency spectrum signals and the section ID to the converting section 60 .
 変換部60は、強調部59から、可聴音の周波数スペクトル信号及び区間IDを取得する。変換部60は、逆短時間フーリエ変換(ISTFT:Inverse Short Time Fourier Transform)によって、可聴音の周波数スペクトル信号を、時間領域の可聴音のデータに変換する。変換部60は、変換後の時間領域の可聴音のデータ及び区間IDを、再生部54及び認識部61に出力する。 The converting unit 60 acquires the frequency spectrum signal and the section ID of the audible sound from the emphasizing unit 59. The converter 60 converts the audible sound frequency spectrum signal into time domain audible sound data by inverse short time Fourier transform (ISTFT). The conversion unit 60 outputs the converted time domain audible sound data and the section ID to the playback unit 54 and the recognition unit 61.
 変換部60は、強調部59から、左用及び右用の可聴音の周波数スペクトル信号と、区間IDとを取得してよい。この場合、変換部60は、左用の可聴音の周波数スペクトル信号を時間領域の左用の可聴音のデータに変換し、右用の可聴音の周波数スペクトル信号を時間領域の右用の可聴音のデータに変換してよい。変換部60は、時間領域の左用及び右用の可聴音のデータを再生部54及び認識部61に出力してよい。 The converting unit 60 may obtain the frequency spectrum signals of the left and right audible sounds and the section ID from the emphasizing unit 59. In this case, the conversion unit 60 converts the frequency spectrum signal of the left audible sound into time-domain left audible sound data, and converts the frequency spectrum signal of the right audible sound into time-domain right audible sound data. You can convert it to The converter 60 may output left and right audible sound data in the time domain to the playback unit 54 and the recognition unit 61 .
 認識部61は、変換部60から、時間領域の可聴音のデータ及び区間IDを取得する。認識部61は、取得した可聴音のデータに対して音声認識処理を実行することにより、可聴音のテキストデータを取得する。認識部61は、取得した可聴音のテキストデータに記憶部46に記憶されたキーワードが含まれか否かを判定する。認識部61は、可聴音のテキストデータにキーワードが含まれると判定した場合、再生トリガ及び区間IDを再生部54に出力する。 The recognition unit 61 acquires time-domain audible sound data and section ID from the conversion unit 60. The recognition unit 61 acquires audible text data by performing speech recognition processing on the acquired audible sound data. The recognition unit 61 determines whether the keyword stored in the storage unit 46 is included in the acquired audible sound text data. If the recognition unit 61 determines that the audible sound text data includes a keyword, it outputs a reproduction trigger and a section ID to the reproduction unit 54 .
 認識部61は、変換部60から時間領域の左用及び右用の可聴音のデータを取得し、左用及び右用の可聴音のテキストデータを取得してもよい。この場合、認識部61は、左用及び右用の可聴音のテキストデータの何れか一方に記憶部46に記憶されたキーワードが含まれると判定した場合、再生トリガ及び区間IDを再生部54に出力してよい。 The recognition unit 61 may acquire left and right audible sound data in the time domain from the conversion unit 60, and may acquire text data of left and right audible sounds. In this case, if the recognition unit 61 determines that the keyword stored in the storage unit 46 is included in either the left or right audible sound text data, the recognition unit 61 outputs the reproduction trigger and section ID to the reproduction unit 54. You may do so.
 再生部54は、変換部60から、変換後の時間領域の可聴音のデータ及び区間IDを取得する。再生部54は、認識部61から、再生トリガ及び区間IDを取得する。再生部54は、認識部61から再生トリガを取得した場合、再生処理を実行する。他の実施形態に係る再生処理では、再生部54は、リプレイフラグをTrueに設定する。再生部54は、変換部60から取得した可聴音のデータのうち、認識部61から再生トリガとともに取得した区間IDと同じ区間IDの可聴音のデータを再生データとする。再生部54は、再生データを集音器3に通信部40によって送信する。再生部54は、全ての再生データを集音器3に送信すると、リプレイフラグをFalseに設定する。 The playback unit 54 acquires the converted time domain audible sound data and the section ID from the conversion unit 60. The playback unit 54 acquires the playback trigger and section ID from the recognition unit 61. When the reproduction unit 54 acquires a reproduction trigger from the recognition unit 61, it executes reproduction processing. In the playback process according to another embodiment, the playback unit 54 sets the replay flag to True. Among the audible sound data obtained from the conversion section 60, the playback section 54 uses, as playback data, audible sound data having the same section ID as the section ID obtained from the recognition section 61 together with the playback trigger. The reproduction section 54 transmits reproduction data to the sound collector 3 through the communication section 40 . After transmitting all the reproduction data to the sound collector 3, the reproduction unit 54 sets the replay flag to False.
 再生部54は、変換部60から、時間領域の左用及び右用の可聴音のデータと、区間IDとを取得してもよい。この場合、再生部54は、時間領域の左用及び右用の可聴音のデータを再生データとしてよい。 The playback unit 54 may obtain the left and right audible sound data and the section ID in the time domain from the conversion unit 60. In this case, the playback unit 54 may use left and right audible sound data in the time domain as playback data.
 処理装置104のその他の構成は、上述した実施形態に係る処理装置4の構成と同じ又は類似である。 The other configuration of the processing device 104 is the same as or similar to the configuration of the processing device 4 according to the embodiment described above.
 (出力装置の動作)
 他の実施形態に係る出力装置2の動作は、図6に示すフローチャートによって説明される。ただし、ステップS5の処理では、生成部20は、式(2)によって正弦波を生成する。
(Operation of output device)
The operation of the output device 2 according to another embodiment will be explained using the flowchart shown in FIG. However, in the process of step S5, the generation unit 20 generates a sine wave using equation (2).
 (集音器の動作)
 他の実施形態に係る集音器3の動作は、図7に示すフローチャートによって説明される。
(Operation of sound collector)
The operation of the sound collector 3 according to another embodiment will be explained by the flowchart shown in FIG.
 (処理装置の動作)
 図12は、他の実施形態に係る処理装置104が実行する音取得処理の流れを示すフローチャートである。例えば、制御部150は、集音器3から処理装置104への外部音のデジタルデータの送信が開始されると、ステップS41の処理を開始する。
(Operation of processing device)
FIG. 12 is a flowchart showing the flow of sound acquisition processing executed by the processing device 104 according to another embodiment. For example, when the transmission of external sound digital data from the sound collector 3 to the processing device 104 is started, the control unit 150 starts the process of step S41.
 処理装置104は、図8に示すステップS21~S44の処理と同じ又は類似に、ステップS41~S44の処理を実行する。 The processing device 104 executes the processes of steps S41 to S44 in the same or similar manner as the processes of steps S21 to S44 shown in FIG.
 ステップS45の処理では、第1抽出部56は、区間検出部51から、音区間の非可聴音のデータ及び区間IDを取得する。第1抽出部56は、取得した音区間の非可聴音のデータから、当該音区間の可聴音の基本周波数を抽出する。 In the process of step S45, the first extraction unit 56 acquires the inaudible sound data and the interval ID of the sound interval from the interval detection unit 51. The first extraction unit 56 extracts the fundamental frequency of the audible sound in the sound interval from the acquired data on the inaudible sound in the sound interval.
 ステップS46の処理では、第2抽出部57は、バッファ55から、音区間以外の区間に含まれる可聴音のデータ及び区間IDを取得する。第2抽出部57は、取得した音区間以外の区間に含まれる可聴音のデータに基づいて、雑音成分の情報を抽出する。 In the process of step S46, the second extraction unit 57 acquires the audible sound data and the section ID included in the section other than the sound section from the buffer 55. The second extraction unit 57 extracts noise component information based on audible sound data included in an interval other than the acquired sound interval.
 ステップS47の処理では、除去部58は、区間バッファ52から、区間ID及び可聴音のデータを取得する。除去部58は、ステップS46の処理で取得された雑音成分の情報に基づいて、区間バッファ52から取得した可聴音のデータから、雑音成分を除去する。 In the process of step S47, the removal unit 58 acquires the section ID and audible sound data from the section buffer 52. The removal unit 58 removes the noise component from the audible sound data obtained from the section buffer 52 based on the noise component information obtained in the process of step S46.
 ステップS48の処理では、強調部59は、ステップS47の処理によって雑音成分が除去された可聴音の周波数スペクトル信号のうち、ステップS45の処理で抽出された基本周波数の整数倍となる周波数の音のパワーを強める。 In the process of step S48, the emphasizing unit 59 selects a sound with a frequency that is an integer multiple of the fundamental frequency extracted in the process of step S45, out of the frequency spectrum signal of the audible sound from which the noise component has been removed by the process of step S47. Strengthen your power.
 ステップS49の処理では、変換部60は、ステップS48の処理が実行された可聴音の周波数スペクトル信号を、時間領域の可聴音のデータに変換する。認識部61は、時間領域の可聴音のデータに対して音声認識処理を実行することにより、可聴音のテキストデータを取得する。 In the process of step S49, the conversion unit 60 converts the frequency spectrum signal of the audible sound on which the process of step S48 was executed into time domain audible sound data. The recognition unit 61 acquires audible text data by performing speech recognition processing on time domain audible sound data.
 ステップS50の処理では、認識部61は、ステップS49の処理で取得した可聴音のテキストデータに記憶部46に記憶されたキーワードが含まれか否かを判定する。認識部61は、可聴音のテキストデータにキーワードが含まれると判定した場合(ステップS50:YES)、図8に示すステップS29の処理に進む。認識部61が可聴音のテキストデータにキーワードが含まれないと判定した場合(ステップS50:NO)、処理装置104は、ステップS41の処理に戻る。 In the process of step S50, the recognition unit 61 determines whether the keyword stored in the storage unit 46 is included in the audible sound text data acquired in the process of step S49. If the recognition unit 61 determines that the audible sound text data includes a keyword (step S50: YES), the process proceeds to step S29 shown in FIG. 8. If the recognition unit 61 determines that the audible sound text data does not include a keyword (step S50: NO), the processing device 104 returns to the process of step S41.
 このように他の実施形態に係る処理装置104では、制御部150の第1抽出部56は、音区間の非可聴域の音のデータから基本周波数の情報を抽出する。また、制御部150の強調部59は、音区間の可聴音のうち、抽出された基本周波数の整数倍となる周波数の音のパワーを強める。このような構成により、集音器3が集音した可聴音のうち、出力装置2が出力した可聴音のパワーを強めることができる。出力装置2が出力した可聴音のパワーを強めることにより、可聴音における雑音の影響が小さくなる。可聴音における雑音の影響が小さくなることにより、当該可聴音のデータが再生された場合、ユーザは、出力装置2が出力したアナウンス音等の音を精度良く聞くことができる。また、可聴音における雑音の影響が小さくなることにより、認識部61が変換した可聴音のテキストデータが、出力装置2が出力したアウアンス音等の可聴音のテキストデータと一致する可能性が高くなる。このような構成により、認識部61は、可聴音のテキストデータに予め設定されたテキストデータが含まれるか否かを精度良く判定することができる。 As described above, in the processing device 104 according to another embodiment, the first extraction unit 56 of the control unit 150 extracts fundamental frequency information from the data of the sound in the inaudible range of the sound interval. Furthermore, the emphasizing unit 59 of the control unit 150 increases the power of sounds with frequencies that are integral multiples of the extracted fundamental frequency, among the audible sounds in the sound section. With such a configuration, the power of the audible sound output by the output device 2 among the audible sounds collected by the sound collector 3 can be increased. By increasing the power of the audible sound output by the output device 2, the influence of noise on the audible sound is reduced. Since the influence of noise on the audible sound is reduced, when the data of the audible sound is reproduced, the user can accurately hear the sound such as the announcement sound outputted by the output device 2. Furthermore, since the influence of noise on the audible sound is reduced, the possibility that the text data of the audible sound converted by the recognition unit 61 matches the text data of the audible sound such as the announcement sound outputted by the output device 2 increases. . With such a configuration, the recognition unit 61 can accurately determine whether or not the audible sound text data includes preset text data.
 さらに、他の実施形態では、制御部150の第2抽出部57は、音区間以外の区間に含まれる可聴域の音のデータに基づいて雑音成分の情報を抽出してよい。また、制御部150の除去部58は、音区間の可聴音のデータから雑音成分を除去してよい。このような構成により、可聴音における雑音の影響が小さくなる。可聴音における雑音の影響が小さくなることにより、可聴音が再生データとして再生された場合、ユーザは、出力装置2が出力したアナウンス音等の音を精度良く聞くことができる。 Furthermore, in other embodiments, the second extraction unit 57 of the control unit 150 may extract information on noise components based on data of sounds in the audible range included in sections other than the sound sections. Further, the removal unit 58 of the control unit 150 may remove noise components from the audible sound data of the sound section. Such a configuration reduces the influence of noise on audible sounds. Since the influence of noise on the audible sound is reduced, when the audible sound is reproduced as reproduction data, the user can accurately hear the sound such as the announcement sound outputted by the output device 2.
 本開示を諸図面及び実施例に基づき説明してきたが、当業者であれば本開示に基づき種々の変形又は修正を行うことが容易であることに注意されたい。したがって、これらの変形又は修正は本開示の範囲に含まれることに留意されたい。例えば、各機能部に含まれる機能等は論理的に矛盾しないように再配置可能である。複数の機能部等は、1つに組み合わせられたり、分割されたりしてよい。上述した本開示に係る各実施形態は、それぞれ説明した各実施形態に忠実に実施することに限定されるものではなく、適宜、各特徴を組み合わせたり、一部を省略したりして実施され得る。つまり、本開示の内容は、当業者であれば本開示に基づき種々の変形及び修正を行うことができる。したがって、これらの変形及び修正は本開示の範囲に含まれる。例えば、各実施形態において、各機能部、各手段又は各ステップ等は論理的に矛盾しないように他の実施形態に追加し、若しくは、他の実施形態の各機能部、各手段又は各ステップ等と置き換えることが可能である。また、各実施形態において、複数の各機能部、各手段又は各ステップ等を1つに組み合わせたり、或いは分割したりすることが可能である。また、上述した本開示の各実施形態は、それぞれ説明した各実施形態に忠実に実施することに限定されるものではなく、適宜、各特徴を組み合わせたり、一部を省略したりして実施することもできる。 Although the present disclosure has been described based on the drawings and examples, it should be noted that those skilled in the art can easily make various changes or modifications based on the present disclosure. It should therefore be noted that these variations or modifications are included within the scope of this disclosure. For example, the functions included in each functional section can be rearranged so as not to be logically contradictory. A plurality of functional units etc. may be combined into one or may be divided. Each embodiment according to the present disclosure described above is not limited to being implemented faithfully to each described embodiment, but may be implemented by combining each feature or omitting a part as appropriate. . In other words, those skilled in the art can make various changes and modifications to the content of the present disclosure based on the present disclosure. Accordingly, these variations and modifications are included within the scope of this disclosure. For example, in each embodiment, each functional unit, each means, each step, etc. may be added to other embodiments so as not to be logically contradictory, or each functional unit, each means, each step, etc. of other embodiments may be added to other embodiments to avoid logical contradiction. It is possible to replace it with Furthermore, in each embodiment, it is possible to combine or divide a plurality of functional units, means, steps, etc. into one. Furthermore, the embodiments of the present disclosure described above are not limited to being implemented faithfully to each of the described embodiments, but may be implemented by combining features or omitting some features as appropriate. You can also do that.
 上述した実施形態では、出力装置2の入力部11から入力される可聴音のデータ及び変換部12が変換した可聴音のデータは、音のデジタルデータであるものとして説明した。ただし、出力装置2の入力部11から入力される可聴音のデータ及び変換部12が変換した可聴音のデータは、音のアナログデータであってもよい。この場合、入力部11は、マイク等を含んで構成されてよい。つまり、ユーザは、入力部11のマイクからアナログデータである可聴音のデータを入力してよい。抽出部19は、アナログデータである可聴音のデータを予め設定されたサンプリングレートでサンプリングすることにより、デジタルデータの可聴音のデータを取得してもよい。 In the embodiment described above, the audible sound data input from the input unit 11 of the output device 2 and the audible sound data converted by the converting unit 12 are described as digital sound data. However, the audible sound data input from the input unit 11 of the output device 2 and the audible sound data converted by the converting unit 12 may be analog sound data. In this case, the input section 11 may include a microphone and the like. That is, the user may input audible sound data, which is analog data, from the microphone of the input unit 11 . The extraction unit 19 may obtain audible sound data as digital data by sampling audible sound data as analog data at a preset sampling rate.
 上述した実施形態では、図4に示すように集音器3及び処理装置4は、別個の装置であるものとして説明した。また、図10に示すように集音器3及び処理装置104は、別個の装置であるものとして説明した。ただし、集音器3及び処理装置4は、一体の装置として構成されてもよい。また、集音器3及び処理装置104は、一体の装置として構成されてもよい。一体の装置として構成する例について、集音器3及び処理装置4を例に図13を参照して説明する。 In the embodiment described above, the sound collector 3 and the processing device 4 are described as being separate devices, as shown in FIG. 4. Further, as shown in FIG. 10, the sound collector 3 and the processing device 104 have been described as being separate devices. However, the sound collector 3 and the processing device 4 may be configured as an integrated device. Further, the sound collector 3 and the processing device 104 may be configured as an integrated device. An example of configuring the sound collector 3 and the processing device 4 as an integrated device will be described with reference to FIG. 13.
 図13に示すような集音器103は、イヤホンであってもよい。集音器103は、集音器3と処理装置4とを一体化させたものである。集音器103は、処理装置ということもできる。集音器103は、マイク30と、スピーカ31と、記憶部33と、フィルタ部47と、制御部34とを備える。制御部34は、取得部35と、再生部36と、区間検出部51と、区間バッファ52と、パターン検出部53とを有する。 The sound collector 103 as shown in FIG. 13 may be an earphone. The sound collector 103 is a combination of the sound collector 3 and the processing device 4. The sound collector 103 can also be called a processing device. The sound collector 103 includes a microphone 30, a speaker 31, a storage section 33, a filter section 47, and a control section 34. The control section 34 includes an acquisition section 35 , a reproduction section 36 , a section detection section 51 , a section buffer 52 , and a pattern detection section 53 .
 図13では、区間検出部51は、音区間を検出すると、音区間の情報を、区間バッファ52及びパターン検出部53に出力する。区間バッファ52は、区間検出部51から取得した音区間の情報に基づいて、フィルタ部47から取得した可聴音データのうちから、音区間に含まれる可聴音データを抽出する。区間バッファ52は、抽出した音区間の可聴音のデータを保持する。 In FIG. 13, when the section detecting section 51 detects a sound section, it outputs information on the sound section to the section buffer 52 and the pattern detecting section 53. The section buffer 52 extracts audible sound data included in the sound section from among the audible sound data obtained from the filter section 47 based on the information on the sound section obtained from the section detection section 51. The section buffer 52 holds audible sound data of the extracted sound section.
 図13では、パターン検出部53は、区間検出部51から音区間の情報を取得すると、抽出した音区間における非可聴音データが予め設定された条件を満たすか否かを判定する。パターン検出部53は、条件を満たすと判定すると、再生トリガを再生部36に出力する。 In FIG. 13, upon acquiring sound section information from the section detecting section 51, the pattern detection section 53 determines whether inaudible sound data in the extracted sound section satisfies preset conditions. When the pattern detection unit 53 determines that the condition is satisfied, it outputs a reproduction trigger to the reproduction unit 36.
 図13では、再生部36は、パターン検出部53から再生トリガを取得すると、区間バッファ52に保持された音区間の可聴音データのうちで、最新の音区間の可聴音データを取得する。再生部36は、取得した音区間の可聴音データをスピーカ31によって出力させる。 In FIG. 13, when the reproduction unit 36 obtains a reproduction trigger from the pattern detection unit 53, it acquires the audible sound data of the latest sound interval from among the audible sound data of the sound intervals held in the interval buffer 52. The reproduction unit 36 causes the speaker 31 to output the audible sound data of the acquired sound section.
 集音器103のその他の構成及び効果は、図4に示すような集音器3及び処理装置4と同じ又は類似である。図10に示すような集音器3及び処理装置104が一体の装置として構成される場合、集音器3の制御部34は、集音器103と類似に、区間検出部51と、区間バッファ52と、再生部54と、バッファ55と、第1抽出部56と、第2抽出部57と、除去部58と、強調部59と、変換部60と、認識部61とを有してよい。 Other configurations and effects of the sound collector 103 are the same or similar to the sound collector 3 and processing device 4 as shown in FIG. When the sound collector 3 and the processing device 104 are configured as an integrated device as shown in FIG. 52, a playback section 54, a buffer 55, a first extraction section 56, a second extraction section 57, a removal section 58, an emphasis section 59, a conversion section 60, and a recognition section 61. .
 一実施形態において、(1)処理装置は、
 可聴域の音の情報を非可聴域の音の成分に含む外部音のデータを取得すると、前記非可聴域の音の成分によって、前記可聴域における音を検出する制御部を備える。
In one embodiment, (1) the processing device includes:
A control unit is provided that detects a sound in the audible range based on the sound component in the inaudible range when external sound data including information on a sound in the audible range is included in a sound component in the inaudible range.
 (2)上記(1)に記載の処理装置において、
 前記制御部は、前記非可聴域の音の成分によって、前記可聴域において音が続く音区間を検出してもよい。
(2) In the processing device described in (1) above,
The control unit may detect a sound section in which the sound continues in the audible range based on the component of the sound in the inaudible range.
 (3)上記(1)又は(2)に記載の処理装置において、
 前記制御部は、前記音区間の可聴域の音のデータを保持してもよい。
(3) In the processing device described in (1) or (2) above,
The control unit may hold data of sounds in an audible range in the sound section.
 (4)上記(1)から(3)までの何れか1つに記載の処理装置において、
 前記制御部は、前記非可聴域の音の成分が予め設定された条件を満たす場合、ユーザに前記条件を満たす音が検出されたことを通知してもよい。
(4) In the processing device according to any one of (1) to (3) above,
If the component of the sound in the inaudible range satisfies a preset condition, the control unit may notify the user that a sound satisfying the condition has been detected.
 (5)上記(4)に記載の処理装置において、
 入力部をさらに備え、
 前記制御部は、前記条件を満たす音が検出されたことを通知した後、再生指示を前記入力部によって受け付けると、前記音区間の可聴域の音のデータを再生する再生処理を実行してもよい。
(5) In the processing device according to (4) above,
It further includes an input section,
When the input unit receives a playback instruction after notifying that a sound satisfying the condition has been detected, the control unit executes a playback process to play back sound data in the audible range of the sound interval. good.
 (6)上記(1)から(5)までの何れか1つに記載の処理装置において、
 前記制御部は、前記非可聴域の音の成分が予め設定された条件を満たす場合、前記可聴域の音を再生する再生処理を実行してよい。
(6) In the processing device according to any one of (1) to (5) above,
The control unit may execute a reproduction process of reproducing the sound in the audible range when the component of the sound in the inaudible range satisfies a preset condition.
 (7)上記(1)から(6)までの何れか1つに記載の処理装置において、
 前記可聴域の音の情報は、前記可聴域の音の基本周波数の情報であってもよい。
(7) In the processing device according to any one of (1) to (6) above,
The information on the sound in the audible range may be information on the fundamental frequency of the sound in the audible range.
 (8)上記(4)から(7)までの何れか1つに記載の処理装置において、
 前記条件は、前記非可聴域の音の周波数スペクトル信号の少なくとも一部が設定パターンと一致するとの条件であり、
 前記設定パターンは、予め設定された可聴音のデータに基づいて生成されてもよい。
(8) In the processing device according to any one of (4) to (7) above,
The condition is that at least a part of the frequency spectrum signal of the inaudible sound matches a set pattern,
The setting pattern may be generated based on preset audible sound data.
 (9)上記(2)又は(3)に記載の処理装置において、
 前記可聴域の音の情報は、前記可聴域の音の基本周波数の情報であり、
 前記制御部は、
 前記音区間の前記非可聴域の音のデータから前記基本周波数の情報を抽出し、
 前記音区間の前記可聴域の音のうち、前記基本周波数の整数倍となる周波数の音のパワーを強めてもよい。
(9) In the processing device according to (2) or (3) above,
The information on the sound in the audible range is information on the fundamental frequency of the sound in the audible range,
The control unit includes:
extracting information on the fundamental frequency from data of the inaudible range sound in the sound interval;
Among the sounds in the audible range of the sound section, the power of sounds with frequencies that are integral multiples of the fundamental frequency may be increased.
 (10)上記(9)に記載の処理装置において、
 前記制御部は、
 前記音区間以外の区間に含まれる前記可聴域の音のデータに基づいて雑音成分の情報を抽出し、
 前記音区間の前記可聴域の音のデータから前記雑音成分を除去してもよい。
(10) In the processing device according to (9) above,
The control unit includes:
extracting noise component information based on data of the audible range sound included in an interval other than the sound interval;
The noise component may be removed from the sound data in the audible range of the sound section.
 (11)上記(9)又は(10)に記載の処理装置において、
 前記制御部は、
 前記音区間の前記可聴域の音のデータに対して音声認識処理を実行することにより、前記音区間の前記可聴域の音のテキストデータを取得し、
 取得した前記テキストデータに予め設定されたキーワードが含まれる場合、前記音区間の可聴域の音のデータを再生する再生処理を実行してもよい。
(11) In the processing device according to (9) or (10) above,
The control unit includes:
Obtaining text data of the sound in the audible range of the sound interval by performing speech recognition processing on the data of the sound in the audible range of the sound interval,
If the acquired text data includes a preset keyword, a reproduction process may be performed to reproduce sound data in the audible range of the sound section.
 (12)上記(9)から(11)までの何れか1つに記載の処理装置において、
 前記制御部は、
 前記音区間の前記可聴域の音のデータに対して音声認識処理を実行することにより、前記音区間の前記可聴域の音のテキストデータを取得し、
 取得した前記テキストデータに予め設定されたキーワードが含まれる場合、ユーザに前記キーワードを含む音が検出されたことを通知してもよい。
(12) In the processing device according to any one of (9) to (11) above,
The control unit includes:
Obtaining text data of the sound in the audible range of the sound interval by performing speech recognition processing on the data of the sound in the audible range of the sound interval,
If the acquired text data includes a preset keyword, the user may be notified that a sound including the keyword has been detected.
 (13)上記(1)から(12)までの何れか1つに記載の処理装置において、
 フィルタ部をさらに備え、
 前記制御部は、前記外部音のデータを取得すると、前記フィルタ部によって前記外部音を前記非可聴域の音のデータと前記可聴域の音のデータとに分けてもよい。
(13) In the processing device according to any one of (1) to (12) above,
Further comprising a filter section,
Upon acquiring the external sound data, the control unit may divide the external sound into data of the inaudible range sound and data of the audible range sound using the filter unit.
 一実施形態において、(14)出力装置は、
 スピーカと、
 可聴域の音のデータを受け付けると、前記可聴域の音の情報を成分に含む非可聴域の音を生成し、前記可聴域の音と前記非可聴域の音とを重畳させた音を前記スピーカによって出力する制御部と、
 を備える。
In one embodiment, (14) the output device includes:
speaker and
When the data of the sound in the audible range is received, a sound in the inaudible range that includes the information on the sound in the audible range is generated, and a sound in which the sound in the audible range and the sound in the inaudible range are superimposed is generated. a control unit that outputs output through a speaker;
Equipped with
 一実施形態において、(15)処理システムは、
 可聴域の音のデータを受け付けると、前記可聴域の音の情報を成分に含む非可聴域の音を生成し、前記可聴域の音と前記非可聴域の音とを重畳させた音を出力する出力装置と、
 外部音のデータを取得すると、前記非可聴域の音の成分によって、前記可聴域における音を検出する処理装置と、
 を含む。
In one embodiment, (15) the processing system includes:
When receiving sound data in the audible range, generates a sound in the inaudible range that includes information on the sound in the audible range as a component, and outputs a sound in which the sound in the audible range and the sound in the inaudible range are superimposed. an output device for
a processing device that detects a sound in the audible range based on a component of the sound in the inaudible range when external sound data is acquired;
including.
 本開示において「第1」及び「第2」等の記載は、当該構成を区別するための識別子である。本開示における「第1」及び「第2」等の記載で区別された構成は、当該構成における番号を交換することができる。例えば、第1抽出部56は、第2抽出部57と識別子である「第1」と「第2」とを交換することができる。識別子の交換は同時に行われる。識別子の交換後も当該構成は区別される。識別子は削除してよい。識別子を削除した構成は、符号で区別される。本開示における「第1」及び「第2」等の識別子の記載のみに基づいて、当該構成の順序の解釈、小さい番号の識別子が存在することの根拠に利用してはならない。 In this disclosure, descriptions such as "first" and "second" are identifiers for distinguishing the configurations. For configurations that are distinguished by descriptions such as “first” and “second” in the present disclosure, the numbers in the configurations can be exchanged. For example, the first extraction unit 56 can exchange the identifiers “first” and “second” with the second extraction unit 57. The exchange of identifiers takes place simultaneously. Even after exchanging identifiers, the configurations are distinguished. Identifiers may be removed. Configurations with removed identifiers are distinguished by codes. The description of identifiers such as "first" and "second" in this disclosure should not be used to interpret the order of the configuration or to determine the existence of lower-numbered identifiers.
 1 処理システム
 2 出力装置
 3,103 集音器
 4,104 処理装置
 10 入力部
 11 入力部
 12 変換部
 13 スイッチ
 14 遅延バッファ
 15 重畳部
 16 スピーカ
 17 記憶部
 18 制御部
 19 抽出部
 20 生成部
 30 マイク
 31 スピーカ
 32 通信部
 33 記憶部
 34 制御部
 35 取得部
 36 再生部
 37 蓄積部
 40 通信部
 41 入力部
 42 通知部
 43 表示部
 44 振動部
 45 発光部
 46 記憶部
 47 フィルタ部
 50,150 制御部
 51 区間検出部
 52 区間バッファ
 53 パターン検出部
 54 再生部
 55 バッファ
 56 第1抽出部
 57 第2抽出部
 58 除去部
 59 強調部
 60 変換部
 61 認識部
1 processing system 2 output device 3,103 sound collector 4,104 processing device 10 input section 11 input section 12 conversion section 13 switch 14 delay buffer 15 superposition section 16 speaker 17 storage section 18 control section 19 extraction section 20 generation section 30 microphone 31 Speaker 32 Communication section 33 Storage section 34 Control section 35 Acquisition section 36 Reproduction section 37 Storage section 40 Communication section 41 Input section 42 Notification section 43 Display section 44 Vibration section 45 Light emitting section 46 Storage section 47 Filter section 50, 150 Control section 51 Section detection section 52 Section buffer 53 Pattern detection section 54 Reproduction section 55 Buffer 56 First extraction section 57 Second extraction section 58 Removal section 59 Emphasis section 60 Conversion section 61 Recognition section

Claims (15)

  1.  可聴域の音の情報を非可聴域の音の成分に含む外部音のデータを取得すると、前記非可聴域の音の成分によって、前記可聴域における音を検出する制御部を備える、処理装置。 A processing device comprising a control unit that detects a sound in the audible range based on the sound component of the inaudible range when external sound data that includes sound information in the audible range is acquired as a sound component in the inaudible range.
  2.  前記制御部は、前記非可聴域の音の成分によって、前記可聴域において音が続く音区間を検出する、請求項1に記載の処理装置。 The processing device according to claim 1, wherein the control unit detects a sound interval in which a sound continues in the audible range based on the component of the sound in the inaudible range.
  3.  前記制御部は、前記音区間の可聴域の音のデータを保持する、請求項2に記載の処理装置。 The processing device according to claim 2, wherein the control unit holds data of sounds in the audible range of the sound section.
  4.  前記制御部は、前記非可聴域の音の成分が予め設定された条件を満たす場合、ユーザに前記条件を満たす音が検出されたことを通知する、請求項3に記載の処理装置。 4. The processing device according to claim 3, wherein when the component of the sound in the inaudible range satisfies a preset condition, the control unit notifies the user that a sound satisfying the condition has been detected.
  5.  入力部をさらに備え、
     前記制御部は、前記条件を満たす音が検出されたことを通知した後、再生指示を前記入力部によって受け付けると、前記音区間の可聴域の音のデータを再生する再生処理を実行する、請求項4に記載の処理装置。
    It further includes an input section,
    The control unit, when receiving a playback instruction through the input unit after notifying that a sound satisfying the condition has been detected, executes a playback process of playing back sound data in an audible range of the sound interval. Item 4. Processing device according to item 4.
  6.  前記制御部は、前記非可聴域の音の成分が予め設定された条件を満たす場合、前記可聴域の音を再生する再生処理を実行する、請求項1に記載の処理装置。 The processing device according to claim 1, wherein the control unit executes a reproduction process of reproducing the sound in the audible range when the component of the sound in the inaudible range satisfies a preset condition.
  7.  前記可聴域の音の情報は、前記可聴域の音の基本周波数の情報である、請求項4から6までの何れか一項に記載の処理装置。 The processing device according to any one of claims 4 to 6, wherein the information on the sound in the audible range is information on the fundamental frequency of the sound in the audible range.
  8.  前記条件は、前記非可聴域の音の周波数スペクトル信号の少なくとも一部が設定パターンと一致するとの条件であり、
     前記設定パターンは、予め設定された可聴音のデータに基づいて生成される、請求項7に記載の処理装置。
    The condition is that at least a part of the frequency spectrum signal of the inaudible sound matches a set pattern,
    The processing device according to claim 7, wherein the setting pattern is generated based on preset audible sound data.
  9.  前記可聴域の音の情報は、前記可聴域の音の基本周波数の情報であり、
     前記制御部は、
     前記音区間の前記非可聴域の音のデータから前記基本周波数の情報を抽出し、
     前記音区間の前記可聴域の音のうち、前記基本周波数の整数倍となる周波数の音のパワーを強める、請求項2に記載の処理装置。
    The information on the sound in the audible range is information on the fundamental frequency of the sound in the audible range,
    The control unit includes:
    extracting information on the fundamental frequency from data of the inaudible range sound in the sound interval;
    3. The processing device according to claim 2, wherein the power of a sound having a frequency that is an integral multiple of the fundamental frequency is increased among sounds in the audible range of the sound interval.
  10.  前記制御部は、
     前記音区間以外の区間に含まれる前記可聴域の音のデータに基づいて雑音成分の情報を抽出し、
     前記音区間の前記可聴域の音のデータから前記雑音成分を除去する、請求項9に記載の処理装置。
    The control unit includes:
    extracting noise component information based on data of the audible range sound included in an interval other than the sound interval;
    The processing device according to claim 9, wherein the noise component is removed from the sound data in the audible range of the sound interval.
  11.  前記制御部は、
     前記音区間の前記可聴域の音のデータに対して音声認識処理を実行することにより、前記音区間の前記可聴域の音のテキストデータを取得し、
     取得した前記テキストデータに予め設定されたキーワードが含まれる場合、前記音区間の可聴域の音のデータを再生する再生処理を実行する、請求項9又は10に記載の処理装置。
    The control unit includes:
    Obtaining text data of the sound in the audible range of the sound interval by performing speech recognition processing on the data of the sound in the audible range of the sound interval,
    11. The processing device according to claim 9, wherein when the acquired text data includes a preset keyword, the processing device executes a reproduction process of reproducing sound data in an audible range of the sound section.
  12.  前記制御部は、
     前記音区間の前記可聴域の音のデータに対して音声認識処理を実行することにより、前記音区間の前記可聴域の音のテキストデータを取得し、
     取得した前記テキストデータに予め設定されたキーワードが含まれる場合、ユーザに前記キーワードを含む音が検出されたことを通知する、請求項9又は10に記載の処理装置。
    The control unit includes:
    Obtaining text data of the sound in the audible range of the sound interval by performing speech recognition processing on the data of the sound in the audible range of the sound interval,
    The processing device according to claim 9 or 10, wherein when the acquired text data includes a preset keyword, the processing device notifies the user that a sound including the keyword has been detected.
  13.  フィルタ部をさらに備え、
     前記制御部は、前記外部音のデータを取得すると、前記フィルタ部によって前記外部音を前記非可聴域の音のデータと前記可聴域の音のデータとに分ける、請求項1又は9に記載の処理装置。
    Further comprising a filter section,
    10. The controller according to claim 1, wherein upon acquiring the external sound data, the filter section divides the external sound into data of the inaudible range sound and data of the audible range sound. Processing equipment.
  14.  スピーカと、
     可聴域の音のデータを受け付けると、前記可聴域の音の情報を成分に含む非可聴域の音を生成し、前記可聴域の音と前記非可聴域の音とを重畳させた音を前記スピーカによって出力する制御部と、
     を備える、出力装置。
    speaker and
    When the data of the sound in the audible range is received, a sound in the inaudible range that includes the information on the sound in the audible range is generated, and a sound in which the sound in the audible range and the sound in the inaudible range are superimposed is generated. a control unit that outputs output through a speaker;
    An output device comprising:
  15.  可聴域の音のデータを受け付けると、前記可聴域の音の情報を成分に含む非可聴域の音を生成し、前記可聴域の音と前記非可聴域の音とを重畳させた音を出力する出力装置と、
     外部音のデータを取得すると、前記非可聴域の音の成分によって、前記可聴域における音を検出する処理装置と、
     を含む、処理システム。
    When receiving sound data in the audible range, generates a sound in the inaudible range that includes information on the sound in the audible range as a component, and outputs a sound in which the sound in the audible range and the sound in the inaudible range are superimposed. an output device for
    a processing device that detects a sound in the audible range based on a component of the sound in the inaudible range when external sound data is acquired;
    processing systems, including;
PCT/JP2023/033103 2022-09-15 2023-09-11 Processing device, output device, and processing system WO2024058147A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2022-147250 2022-09-15
JP2022147250 2022-09-15

Publications (1)

Publication Number Publication Date
WO2024058147A1 true WO2024058147A1 (en) 2024-03-21

Family

ID=90274996

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2023/033103 WO2024058147A1 (en) 2022-09-15 2023-09-11 Processing device, output device, and processing system

Country Status (1)

Country Link
WO (1) WO2024058147A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000207170A (en) * 1999-01-14 2000-07-28 Sony Corp Device and method for processing information
US20140270194A1 (en) * 2013-03-12 2014-09-18 Comcast Cable Communications, Llc Removal of audio noise
JP2018185401A (en) * 2017-04-25 2018-11-22 トヨタ自動車株式会社 Voice interactive system and voice interactive method
WO2021059497A1 (en) * 2019-09-27 2021-04-01 日本電気株式会社 Audio signal processing device, audio signal processing method, and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000207170A (en) * 1999-01-14 2000-07-28 Sony Corp Device and method for processing information
US20140270194A1 (en) * 2013-03-12 2014-09-18 Comcast Cable Communications, Llc Removal of audio noise
JP2018185401A (en) * 2017-04-25 2018-11-22 トヨタ自動車株式会社 Voice interactive system and voice interactive method
WO2021059497A1 (en) * 2019-09-27 2021-04-01 日本電気株式会社 Audio signal processing device, audio signal processing method, and storage medium

Similar Documents

Publication Publication Date Title
US10685638B2 (en) Audio scene apparatus
KR101606966B1 (en) Systems, methods, apparatus, and computer-readable media for spatially selective audio augmentation
CN108156550B (en) Playing method and device of headset
US10224019B2 (en) Wearable audio device
JP5493611B2 (en) Information processing apparatus, information processing method, and program
US11948561B2 (en) Automatic speech recognition imposter rejection on a headphone with an accelerometer
CN110708625A (en) Intelligent terminal-based environment sound suppression and enhancement adjustable earphone system and method
JP6931819B2 (en) Voice processing device, voice processing method and voice processing program
CN112352441B (en) Enhanced environmental awareness system
JP2015206989A (en) Information processing device, information processing method, and program
JP2016535305A (en) A device for improving language processing in autism
JP6268033B2 (en) Mobile device
JPWO2008007616A1 (en) Non-voice utterance input warning device, method and program
WO2024058147A1 (en) Processing device, output device, and processing system
JP7284570B2 (en) Sound reproduction system and program
CN103295571A (en) Control using time and/or spectrally compacted audio commands
CN110782887A (en) Voice signal processing method, system, device, equipment and computer storage medium
JP2007187748A (en) Sound selective processing device
US10805710B2 (en) Acoustic device and acoustic processing method
US20230239617A1 (en) Ear-worn device and reproduction method
US20200111505A1 (en) Information processing apparatus and information processing method
JP2012194295A (en) Speech output system
JP6766981B2 (en) Broadcast system, terminal device, broadcasting method, terminal device operation method, and program
CN115580678A (en) Data processing method, device and equipment
CN113823278A (en) Voice recognition method and device, electronic equipment and storage medium