WO2018082315A1 - 一种音频播放方法、系统和装置 - Google Patents
一种音频播放方法、系统和装置 Download PDFInfo
- Publication number
- WO2018082315A1 WO2018082315A1 PCT/CN2017/089207 CN2017089207W WO2018082315A1 WO 2018082315 A1 WO2018082315 A1 WO 2018082315A1 CN 2017089207 W CN2017089207 W CN 2017089207W WO 2018082315 A1 WO2018082315 A1 WO 2018082315A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- audio
- frame
- noise
- command
- input module
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 59
- 230000005236 sound signal Effects 0.000 claims description 36
- 238000001514 detection method Methods 0.000 claims description 23
- 210000000988 bone and bone Anatomy 0.000 claims description 17
- 238000004891 communication Methods 0.000 claims description 12
- 238000001228 spectrum Methods 0.000 claims description 9
- 230000009467 reduction Effects 0.000 claims description 6
- 238000000605 extraction Methods 0.000 claims description 5
- 239000004615 ingredient Substances 0.000 claims 1
- 230000003595 spectral effect Effects 0.000 claims 1
- 239000000306 component Substances 0.000 description 9
- 238000012545 processing Methods 0.000 description 6
- 239000000284 extract Substances 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 230000001133 acceleration Effects 0.000 description 2
- 238000012937 correction Methods 0.000 description 2
- 238000013500 data storage Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 239000013013 elastic material Substances 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012216 screening Methods 0.000 description 2
- 239000000565 sealant Substances 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000012790 confirmation Methods 0.000 description 1
- 239000008358 core component Substances 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 230000008676 import Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000011022 operating instruction Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 230000001915 proofreading effect Effects 0.000 description 1
- 230000035939 shock Effects 0.000 description 1
- 230000001629 suppression Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11C—STATIC STORES
- G11C7/00—Arrangements for writing information into, or reading information out from, a digital store
- G11C7/16—Storage of analogue signals in digital stores using an arrangement comprising analogue/digital [A/D] converters, digital memories and digital/analogue [D/A] converters
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R1/00—Details of transducers, loudspeakers or microphones
- H04R1/10—Earpieces; Attachments therefor ; Earphones; Monophonic headphones
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R3/00—Circuits for transducers, loudspeakers or microphones
Definitions
- the present invention relates to the field of audio processing and playback technologies, and more particularly to an audio playback method, system and apparatus.
- the patent application with the application number 105097001A discloses an audio playing method and apparatus, wherein the method comprises: collecting an external sound signal, and identifying the collected sound signal; determining, according to the recognition result, the collected sound signal corresponding to the corresponding
- the audio playback control command is performed, for the audio file pre-stored in the audio playback device, the audio playback operation of the corresponding audio file is performed according to the audio playback control command; and the audio playback device outputs the audio signal based on the audio playback operation 2 A mechanical shock of a corresponding frequency is generated based on the audio signal.
- the application can use the bone conduction module, it is only applied to audio playback, and the acquired sound command It is not processed, so that the obtained audio command is not clear, and the correct control command cannot be obtained. At the same time, the application intelligently plays the audio files stored in the local storage, and cannot obtain more audio files through the network.
- the present invention provides an audio playing method, system and device.
- the sound signal is more clear and accurate, and can be downloaded from a cloud server through a network. Audio files can be played anywhere, anytime.
- a first aspect of the present invention provides an audio playback method, including opening an input module, including the following steps:
- Step 1 obtaining an operation instruction by using the input module
- Step 2 parsing the operation instruction and generating a control instruction
- Step 3 Execute the control instruction to obtain an audio file.
- Step 4 Play the audio file.
- the input module includes at least one of an audio input module, a text input module, and a gesture input module; the audio input module generates an effective audio command signal after receiving an audio command signal; and the text input module generates a text command signal.
- the gesture input module generates a gesture instruction signal.
- the audio input module comprises at least one microphone and a bone conduction microphone.
- the audio signal comprises a first audio signal and a second audio signal.
- the first audio signal refers to collecting mechanical waves generated by vibration of a user's body using the bone conduction microphone.
- the second audio signal is an acoustic wave within a time range in which the mechanical wave is generated by the microphone.
- the audio input module acquires
- the method of operating instructions includes the following sub-steps:
- Step 11 Perform audio characteristic detection on the collected audio command signal.
- Step 12 Perform a primary sound source determination
- Step 13 Eliminate noise
- Step 14 Output the valid audio command signal.
- the audio characteristic detection includes at least one of voice detection, noise detection, and correlation feature extraction.
- the audio characteristic detection method extracts audio data x i (n) having a frame length of Tms at a time, and calculates an average energy E i , a zero-crossing rate ZCR i , and a short-time correlation. Sex R i and short-term cross-correlation C ij (k),
- the method for detecting audio characteristics is further based on the average energy E i , the zero-crossing rate ZCR i , the short-term correlation R i , and the short-term cross-correlation C ij (k) calculates the non-silent probability of the current frame Speech probabilities
- the method for detecting audio characteristics is further performed according to the non-silence probability of a current frame of the i channel.
- the speech probability Determine the type of the current frame, that is, whether it is a noise frame, a voice frame, a noiseless environment sound frame,
- the formula is the empirical value of the relevant decision, Ambient is the noiseless ambient sound frame, Noise is the noise frame, and Speech is the speech frame.
- the step 32 is to determine a primary data path according to a primary source determination principle.
- the principle of determining the primary sound source comprises:
- the step 13 is to obtain a noise spectrum characteristic according to a Noise noise frame associated with the speech data frame of the main data path, and suppress the noise spectrum component in the frequency domain of the Speech audio frame. .
- the operation instruction includes at least one of the valid audio command signal, the text command signal, or the gesture command signal.
- control instruction comprises at least one of a search instruction, a screening instruction, a cache instruction, a download instruction, a storage instruction, and a play instruction.
- the search instruction refers to preferentially searching in the local storage, and if not, searching in the cloud through the communication component.
- the communication component comprises at least one of wifi, wireless, 2G/3G/4G/5G, and GPRS.
- the acquiring the audio file refers to executing the cache instruction or the download instruction, and obtaining the audio file from the cloud by using the communication component.
- the play instruction refers to playing through a sound. Play a cached audio file or an audio file in local storage.
- a second part of the invention discloses a sound collection system comprising an input module and further comprising the following modules:
- An operation instruction acquisition module acquiring an operation instruction by using the input module
- An operation instruction parsing module parsing the operation instruction and generating a control instruction
- An audio file obtaining module configured to execute the control instruction to obtain an audio file
- the audio file playing module is configured to push valid audio data of the audio file to the terminal device.
- the input module includes at least one of an audio input module, a text input module, and a gesture input module; the audio input module generates an effective audio command signal after receiving an audio command signal; and the text input module generates a text command signal.
- the gesture input module generates a gesture instruction signal.
- the audio input module comprises at least one bone conduction microphone and at least one microphone.
- the audio signal comprises a first audio signal and a second audio signal.
- the first audio signal refers to collecting mechanical waves generated by vibration of a user's body using the bone conduction microphone.
- the second audio signal is an acoustic wave within a time range in which the mechanical wave is generated by the microphone.
- the operation instruction acquisition module further includes the following sub-modules:
- the audio characteristic detecting submodule is configured to perform audio characteristic detection on the collected audio signal
- Main source judgment sub-module used for main source determination
- Noise reduction sub-module used to eliminate noise
- Audio command output sub-module for outputting the valid audio command signal.
- the audio characteristic detection includes at least one of voice detection, noise detection, and correlation feature extraction.
- the audio characteristic detection method extracts audio data x i (n) having a frame length of Tms at a time, and calculates an average energy E i , a zero-crossing rate ZCR i , and a short-time correlation. Sex R i and short-term cross-correlation C ij (k),
- the method for detecting audio characteristics is further based on the average energy E i , the zero-crossing rate ZCR i , the short-term correlation R i , and the short-term cross-correlation C ij (k) calculates the non-silent probability of the current frame Speech probabilities
- the method for detecting audio characteristics is further performed according to the non-silence probability of a current frame of the i channel.
- the speech probability Determine the type of the current frame, that is, whether it is a noise frame, a voice frame, a noiseless environment sound frame,
- Ambient is a noiseless environment sound frame
- Noise is a noise frame
- Speech is a voice frame.
- the main sound source judging sub-module has a function of determining a main data path according to a main sound source judging principle.
- the principle of determining the primary sound source comprises:
- the noise reduction sub-module has a noise spectrum characteristic obtained according to a Noise audio frame associated with the Speech data frame of the main data path, and a noise spectrum component in a frequency domain of the Speech audio frame. Effective suppression, the function of obtaining pure voice data.
- the operation instruction includes at least one of the valid audio command signal, the text command signal, or the gesture command signal.
- control instruction comprises at least one of a search instruction, a screening instruction, a cache instruction, a download instruction, a storage instruction, and a play instruction.
- the search instruction refers to preferentially searching in the local storage, and if not, searching in the cloud through the communication component.
- the communication component comprises at least one of wifi, wireless, 2G/3G/4G/5G, and GPRS.
- the play instruction refers to playing a cached audio file or an audio file in a local memory through a playback device.
- the play instruction refers to playing a cached audio file or an audio file in a local memory through a playback device.
- a third aspect of the invention discloses a sound collecting device comprising a housing, further comprising the system of any of the above.
- the sound collection device is fixedly mounted on the smart device.
- the smart device comprises at least one of a smart phone, a smart camera, a smart earphone, and other smart devices.
- the invention realizes high-definition voice command input through the processing of the audio signal, liberating the hands, making the wearable device more convenient in application and closer to people's usage habits.
- FIG. 1 is a flow chart of a preferred embodiment of an audio playback method in accordance with the present invention.
- FIG. 2 is a block diagram of a preferred embodiment of an audio playback system in accordance with the present invention.
- FIG 3 is a schematic cross-sectional view of an embodiment of a bone conduction microphone of an audio playback device in accordance with the present invention.
- FIG. 4 is a block diagram showing an embodiment of a smart earphone of an audio playback device in accordance with the present invention.
- Figure 5 is a flow chart showing an embodiment of a noise reduction method of an audio playback method in accordance with the present invention.
- FIG. 6 is a flow chart showing an embodiment of a method for initializing a dialect recognition module of an audio playing method according to the present invention.
- Figure 7 is a flow chart showing an embodiment of a dialect recognition method of an audio playback method in accordance with the present invention.
- step 100 is executed to open the input module 200 (including the audio input module 201, the handwriting input module 202, and the keyboard input module 203).
- Step 110 is executed to determine the input module type. If the input module is the audio input module 201 (including a bone conduction microphone and a microphone), step 120 is performed, and the audio characteristic detecting submodule 211 transmits the input audio signal (including the first audio signal collected from the microphone and the bone conduction). The second audio signal collected by the microphone) performs audio characteristic detection (including voice detection, noise detection, and correlation feature extraction).
- the steps of audio characteristic detection are as follows: 1) extract audio data with a frame length of 20 ms, x i (n), and calculate average energy E i , zero-crossing rate ZCR i , short-time correlation R i and short-term cross-correlation C Ij (k), among them, 2) calculating a non-silence probability of the current frame according to the average energy E i , the zero-crossing rate ZCR i , the short-term correlation R i and the short-term cross-correlation C ij (k) Speech probabilities among them,
- Step 121 is executed, the primary sound source determining sub-module 212 is based on the current frame. The value and the result of the decision determine the current frame extracted from that way as the primary source of the current position frame.
- the determination method is as follows: 1) When a certain path is a Speech speech frame, and the other path is an Ambient noiseless environment sound frame or a Noise noise frame, the path is determined as the main data path of the current position frame; 2) When a certain path is Ambient Noise ambient sound frame, while the other is Noise noise frame, determine the way as the main data path of the current position frame; 3) When both paths are the same type of frame, determine The channel with the largest value is used as the main data path of the current position frame.
- Step 122 is executed, the main sound source still contains a small amount of noise data, and the noise reduction sub-module 213 obtains the noise spectrum characteristic according to the noise frame associated with the speech frame of the main data path, and suppresses the noise spectrum component in the frequency domain of the speech frame.
- Step 123 is executed to output a voice operation instruction.
- step 130 is executed to determine the text input type. If it is a handwriting input, step 131 is executed, and the handwritten character judgment sub-module 215 determines the type of the text input by the handwriting, and recognizes the characters and numbers. Step 132 is executed.
- the handwritten character error correction sub-module 216 intelligently corrects the typo according to the characters and numbers obtained from the handwritten character judgment sub-module 215, and obtains a relatively accurate text command.
- Step 133 is executed to output a text operation command. If it is a keyboard input, step 132 is executed.
- the keyboard text confirmation sub-module 218 confirms the input text and performs intelligent error correction to obtain a relatively accurate text command, and executes step 133 to output a text operation command.
- Step 140 is executed, and the operation instruction analysis module 220 parses the obtained voice operation instruction or the text operation instruction, and generates a control instruction (including a search instruction, a filter instruction, a cache instruction, a download instruction, a storage instruction, a play instruction, and the like).
- Step 150 is executed, the audio file obtaining module 230 executes a control instruction, and the control instruction is preferentially executed in the local storage module 231 when the local storage module 231 cannot be executed.
- the audio file is then downloaded through the network module 232.
- Step 160 is executed to play the audio file through the audio output device.
- the housing number is 301
- the vibration collector is labeled 302
- the pressure sensor is 303
- the signal processor is labeled 304
- the vibration chamber is labeled 305
- the wire number is 306
- the circuit board number is 307
- the base number is 307.
- the signal acquisition part is labeled 309.
- a bone transmitting microphone 10, as shown in FIG. 1, includes a housing 301, a vibration collector 302, a pressure sensor 303, a signal processor 304, a wire 306, and a circuit board 307.
- the outer casing 301 is coupled to the vibration harvester 302 to form an enclosed space.
- the circuit board 307 is disposed at the bottom of the outer casing 301 in the enclosed space, and the processor is disposed on the circuit board 307 and is electrically connected to the circuit board 307.
- the pressure sensor 303 is disposed between the circuit board 307 and the vibration collector 302 in the closed space, and is fixedly connected to the outer casing 301.
- the pressure sensor 303 is electrically connected to the circuit board 307 via a wire 306.
- the outer casing 301 is at least partially made of an elastic material.
- the pressure sensor 303 is a curved surface that protrudes downward.
- Non-planar pressure sensors, especially those with curved surfaces, are more sensitive to the vibration of the sound source and are beneficial to the collection of sound sources.
- the pressure sensor end 312 is coupled to a connection portion provided on the outer casing 301.
- the connecting portion is a recess, and the pressure sensor end portion 312 is engaged with it.
- a sealant is applied at the joint of the connecting portion and the pressure sensor end portion 312 for improving the airtightness of the vibration chamber 305, and reducing or avoiding the sound loss caused by air leakage of the air bag.
- the vibration collector 302 includes a signal acquisition unit 309 and a first connection portion 310.
- the housing 301 includes a second connection portion 311. As shown in FIG. 3, the first connection portion 310 and the second connection portion 311 are fixedly connected. And connected by a sealant into a sealed whole.
- the fixed connection is a snap connection.
- the first connecting portion 310 is a concave portion
- the second connecting portion 311 is a convex portion
- the first connecting portion 310 is a convex portion
- the second connecting portion 311 is a concave portion.
- the recess is in contact with the convex portion.
- the vibration collector 302 is made of an elastic material.
- the signal acquisition portion 309 is composed of a plurality of protrusions that are convex upward.
- the raised connections are a unitary body.
- the protrusion is a thin-walled curved surface.
- the protrusions are distributed on the surface of the vibration collector 302.
- the vibration collector 302 forms a closed cavity with at least the pressure sensor 303.
- the cavity is a vibration chamber 305.
- a base 308 that is integrally coupled to the outer casing 301.
- the circuit board 307 is disposed on the base 308.
- the signal processor 304 is disposed on the circuit board 307.
- the pressure sensor 303 is guided Line 306 is coupled to circuit board 307.
- a headset 400 incorporating an acoustic acquisition system including a left earphone 410 and a right earphone 430.
- the core components of the sound collection system are concentrated in the left earphone 410, including a 3G/4G network numbered 420, wifi/Bluetooth number 421, LCD display/touch screen numbered 422, and an acceleration sensor numbered 423/ Gyro, GPS labeled 424, bone conduction microphone (left) numbered 425, speaker number 426 (left), audio signal processing (DAC) numbered 427, local data storage numbered 428 and labeled 429 CPU.
- 3G/4G network wifi/Bluetooth, LCD display/touch screen, acceleration sensor/gyroscope, GPS, audio signal processing (DAC) and local data storage are connected to the CPU, bone conduction microphone (left) and speaker (left) Connected to an audio signal processing (DAC).
- DAC audio signal processing
- auxiliary components are concentrated in the right earphone 430, including a speaker numbered 440 (right), sensors labeled 441 and 443, a trackpad music control numbered 442, and a bone conduction microphone labeled 444 (right). And the battery labeled 445.
- the speaker (right), sensor, trackpad music control and battery are connected to the CPU in the left earphone, and the bone conduction microphone (right) is connected to the speaker (right).
- step 500 is performed to import the main audio data.
- Step 510 is executed to retrieve the environment determination data stored in the memory.
- Step 520 is executed to compare the main audio data with the environment determination data, and determine the noise environment around the main audio input.
- Step 530 and step 540 are sequentially executed to retrieve the environmental noise data from the memory and perform single frame comparison with the main audio data.
- Step 550 is executed to remove the same audio data as the ambient noise data in the single audio data frame.
- Step 560 is executed to generate valid audio data without noise.
- the audio playback system also includes a dialect recognition module for recognizing the dialects collected by the audio input module.
- step 600 and step 610 are sequentially executed, the dialect recognition module initialization process is started, and the corresponding voice is input according to the prompt.
- Step 620 is executed to connect to the cloud server through the network module according to the input voice, and check whether it is saved in the existing dialect library. If it has been saved in the existing dialect library, step 630 is executed to retrieve and download the dialect library.
- Step 640 is executed, and the corresponding voice is input according to the prompt, and the dialect library downloaded to the local storage is compared and corrected, and the dialect is fine-tuned according to its own habits.
- Step 650 is executed and saved in the local storage.
- step 621 is executed, the voice is input through the audio input module, and the corresponding word is input through the handwriting input module or the keyboard input module.
- step 622 is performed, all common language proofreading inputs are completed and saved in the local storage.
- Step 623 is executed to upload to the dialect library of the cloud server.
- step 600 is performed to input voice through the voice input module.
- Step 610 is executed to determine whether the dialect corresponding to the voice is saved in the local storage. If it is saved in the local storage, steps 620 and 650 are sequentially executed to retrieve the dialect library in the local storage and perform dialect comparison.
- Step 660 is executed to generate a control instruction according to the dialect comparison result.
- step 630 is performed to perform a dialect retrieval comparison in the cloud server to determine a suitable dialect library.
- step 640 and step 650 are sequentially executed, and the corresponding dialect library is downloaded through the network module and dialect comparison is performed.
- Step 660 is executed to generate a control instruction according to the dialect comparison result.
- the methods, apparatus and systems of the present invention may be implemented in a number of ways.
- the methods and systems of the present invention can be implemented in software, hardware, firmware, or any combination of software, hardware, and firmware.
- the above-described sequence of steps for the method is for illustrative purposes only, and the steps of the method of the present invention are not limited to the order specifically described above unless otherwise specifically stated.
- the invention may also be embodied as a program recorded in a recording medium, the program comprising machine readable instructions for implementing the method according to the invention.
- the invention also covers a recording medium storing a program for performing the method according to the invention.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Signal Processing (AREA)
- Computational Linguistics (AREA)
- Quality & Reliability (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- User Interface Of Digital Computer (AREA)
- Circuit For Audible Band Transducer (AREA)
Abstract
提供一种音频播放方法、系统和装置,其中方法包括:包括开启输入模块,还包括以下步骤:通过所述输入模块获取操作指令;解析所述操作指令,并生成控制指令;执行所述控制指令,获取音频文件;播放所述音频文件。通过语音输入控制指令,控制音频的播放,可应用于各种智能设备和可穿戴设备上,解放了双手,提高了用户的应用体验。
Description
本发明涉及音频处理播放技术领域,特别是一种音频播放方法、系统和装置。
人们传统的方式是使用耳机来听音乐,大多数耳机是有线的连接到提供音频的播放器上,也有较新型的蓝牙耳机可以与播放器无线连接。所谓的播放器——从早期的磁带播放器、CD机,到后来陆续出现的ipod播放器、智能手机、平板电脑等智能终端,乃至普及多年的PC机——虽然类型不断更迭,但始终承担着保存、输出音频信号,同时在绝大多数情况下接收用户操作,并控制播放的功能。这样的组合在现在看来非常不方便。随着智能穿戴设备的发展和人们生活水平的不断提高,各种智能穿戴设备如智能手表的使用越来越普及,智能穿戴设备已经成为人们生活中不可缺少的通信工具。
但是现有的可穿戴设备大多数都依然需要手动对设备进行操作,才能够实现音乐的正常播放。如何在获得简单高效且可操作性很强的体验,尽可能少的占用双手,是可穿戴设备急需要解决的问题。
申请号为105097001A的专利申请公开了音频播放方法和装置,其中的方法包括:采集外部的声音信号,并对采集到的声音信号进行识别;在根据识别结果确定出采集到的声音信号对应相应的音频播放控制命令时,针对音频播放装置中预先存储的音频文件,根据所述音频播放控制命令执行相应的音频文件的音频播放操作;音频播放装置在基于音频播放操作二输出音频信号的情况下,根据所述音频信号产生相应频率的机械震动。该申请虽然能够使用了骨传导模块,但是仅仅应用于音频播放,对采集到的声音指令
并没有进行处理,使得得到的音频指令不清晰,不能够得到正确的控制指令。同时,该申请智能播放本地存储器中存储的音频文件,并不能够通过网络获取更多的音频文件。
发明内容
为了解决上述的技术问题,本发明提出了一种音频播放方法、系统和装置,通过对输入进来的声音信号进行深度处理,使得声音信号更清晰准确,同时可以通过网络从云服务器中下载,实现音频文件的随时随地随心播放。
本发明的第一方面提供了一种音频播放方法,包括开启输入模块,包括以下步骤:
步骤1:通过所述输入模块获取操作指令;
步骤2:解析所述操作指令,并生成控制指令;
步骤3:执行所述控制指令,获取音频文件;
步骤4:播放所述音频文件。
优选的是,所述输入模块包括音频输入模块、文字输入模块和手势输入模块中至少一种;所述音频输入模块接受音频指令信号后生成有效音频指令信号;所述文字输入模块生成文字指令信号;所述手势输入模块生成手势指令信号。
在上述任一方案中优选的是,所述音频输入模块包括至少一个麦克风和一个骨传导麦克风。
在上述任一方案中优选的是,所述音频信号包括第一音频信号和第二音频信号。
在上述任一方案中优选的是,所述第一音频信号是指利用所述骨传导麦克风采集由于用户身体的震动产生的机械波。
在上述任一方案中优选的是,所述第二音频信号是指利用所述麦克风采集所述机械波生成的时间范围内的声波。
在上述任一方案中优选的是,通过所述音频输入模块获取所
述操作指令的方法包括以下子步骤:
步骤11:对采集到的所述音频指令信号进行音频特性检测;
步骤12:进行主音源判定;
步骤13:消除噪声;
步骤14:输出所述有效音频指令信号。
在上述任一方案中优选的是,所述音频特性检测包括语音检测、噪音检测和相关性特征提取中至少一种。
在上述任一方案中优选的是,所述音频特性检测的方法为每次提取帧长为Tms的音频数据xi(n),并计算平均能量Ei、过零率ZCRi、短时相关性Ri和短时互相关性Cij(k),
在上述任一方案中优选的是,所述步骤32为根据主音源判定原则确定主数据通路。
在上述任一方案中优选的是,所述主音源判定原则包括:
1)当某一路为Speech,而另一路为Ambient或者Noise时,确定该路作为当前位置帧的所述主数据通路;
2)当某一路为Ambient,而另一路为Noise时,确定该路作为当前位置帧的所述主数据通路;
在上述任一方案中优选的是,所述步骤13为根据所述主数据通路Speech音频帧前后关联的Noise噪音帧获得噪声频谱特性,并对Speech音频帧在频域上对噪声频谱成分进行抑制。
在上述任一方案中优选的是,所述操作指令包括所述有效音频指令信号、所述文字指令信号或所述手势指令信号中的至少一种。
在上述任一方案中优选的是,所述控制指令包括搜索指令、筛选指令、缓存指令、下载指令、存储指令和播放指令中至少一种。
在上述任一方案中优选的是,所述搜索指令是指优先在本地存储器中进行搜索,若没有则通过通信组件在云端进行搜索。
在上述任一方案中优选的是,所述通信组件包括wifi、无线、2G/3G/4G/5G和GPRS中至少一种。
在上述任一方案中优选的是,所述获取音频文件是指执行所述缓存指令或下载指令,通过所述通信组件从云端得到音频文件。
在上述任一方案中优选的是,所述播放指令是指通过放音设
备播放缓存音频文件或本地存储器中的音频文件。
本发明的第二部分公开了一种声音采集系统,包括输入模块,还包括以下模块:
操作指令获取模块:通过所述输入模块获取操作指令;
操作指令解析模块:解析所述操作指令,并生成控制指令;
音频文件获取模块:用于执行所述控制指令,获取音频文件;
音频文件播放模块:用于把所述音频文件的有效音频数据推送给终端设备。
优选的是,所述输入模块包括音频输入模块、文字输入模块和手势输入模块中至少一种;所述音频输入模块接受音频指令信号后生成有效音频指令信号;所述文字输入模块生成文字指令信号;所述手势输入模块生成手势指令信号。
在上述任一方案中优选的是,所述音频输入模块包括至少一个骨传导麦克风和至少一个麦克风。
在上述任一方案中优选的是,所述音频信号包括第一音频信号和第二音频信号。
在上述任一方案中优选的是,所述第一音频信号是指利用所述骨传导麦克风采集由于用户身体的震动产生的机械波。
在上述任一方案中优选的是,所述第二音频信号是指利用所述麦克风采集所述机械波生成的时间范围内的声波。
在上述任一方案中优选的是,述操作指令获取模块还包括以下子模块:
音频特性检测子模块:用于对采集到的所述音频信号进行音频特性检测;
主音源判定子模块:用于进行主音源判定;
降噪子模块:用于消除噪声;
音频指令输出子模块:用于输出所述有效音频指令信号。
在上述任一方案中优选的是,所述音频特性检测包括语音检测、噪音检测和相关性特征提取中至少一种。
在上述任一方案中优选的是,所述主音源判定子模块具有根据主音源判定原则确定主数据通路的功能。
在上述任一方案中优选的是,所述主音源判定原则包括:
1)当某一路为Speech,而另一路为Ambient或者Noise时,确定该路作为当前位置帧的所述主数据通路;
2)当某一路为Ambient,而另一路为Noise时,确定该路作为当前位置帧的所述主数据通路;
在上述任一方案中优选的是,所述降噪子模块具有根据所述主数据通路Speech音频帧前后关联的Noise音频帧获得噪声频谱特性,并对Speech音频帧在频域上对噪声频谱成分进行有效抑制,得到较纯净的语音数据的功能。
在上述任一方案中优选的是,所述操作指令包括所述有效音频指令信号、所述文字指令信号或所述手势指令信号中的至少一种。
在上述任一方案中优选的是,所述控制指令包括搜索指令、筛选指令、缓存指令、下载指令、存储指令和播放指令中至少一种。
在上述任一方案中优选的是,所述搜索指令是指优先在本地存储器中进行搜索,若没有则通过通信组件在云端进行搜索。
在上述任一方案中优选的是,所述通信组件包括wifi、无线、2G/3G/4G/5G和GPRS中至少一种。
在上述任一方案中优选的是,所述播放指令是指通过放音设备播放缓存音频文件或本地存储器中的音频文件。
在上述任一方案中优选的是,所述播放指令是指通过放音设备播放缓存音频文件或本地存储器中的音频文件。
本发明的第三方面公开了一种声音采集装置,包括外壳,还包括上述任一项所述的系统。
优选的是,所述声音采集装置固定安装在智能设备上。
在上述任一方案中优选的是,所述智能设备包括:智能手机、智能相机、智能耳机和其他智能设备中至少一种。
本发明通过对音频信号的处理,实现了高清晰语音指令输入,解放了双手,使得可穿戴设备在应用上更加方便,更贴近人们的使用习惯。
图1为按照本发明的音频播放方法的一优选实施例的流程图。
图2为按照本发明的音频播放系统的一优选实施例的模块示意图。
图3为按照本发明的音频播放装置的骨传导麦克风的一实施例的截面示意图。
图4为按照本发明的音频播放装置的智能耳机的一实施例的结构示意图。
图5为按照本发明的音频播放方法的降噪方法的一实施例的流程图。
图6为按照本发明的音频播放方法的方言识别模块初始化方法的一实施例的流程图。
图7为按照本发明的音频播放方法的方言识别方法的一实施例的流程图。
下面结合附图和具体的实施例对本发明做进一步的阐述。
实施例一
如图1、图2所示,执行步骤100,开启输入模块200(包括音频输入模块201、手写输入模块202、键盘输入模块203)。执行步骤110,判断输入模块类型。如果输入模块是音频输入模块201(包括一个骨传导麦克风和一个麦克风),则执行步骤120,
音频特性检测子模块211对输入的音频信号(包括从麦克风收集到的第一音频信号和从骨传导麦克风收集到的第二音频信号)进行音频特性检测(包括语音检测、噪音检测和相关性特征提取)。音频特性检测的步骤如下:1)提取帧长为20ms的音频数据,xi(n),并计算平均能量Ei、过零率ZCRi、短时相关性Ri和短时互相关性Cij(k),
其中,2)根据所述平均能量Ei、所述过零率ZCRi、所述短时相关性Ri和所述短时互相关性Cij(k)计算当前帧的非静音概率和语音概率
其中,为i通道max(Ei*ZCRi)的经验参考值,为i通道max{max[Ri(k)]*max[Cij(k)]}的经验参考值。3)所述音频特性检测的方法还为根据所述i通道当前帧的所述非静音概率和所述语音概率判断当前帧的类型,即是否为噪声帧、语音帧、无噪环境音帧,其中,是于相关判决的经验值,Ambient为无噪环境音帧,Noise为噪音帧,Speech为语音帧。执行步骤121,主音源判定子模块212根据当前帧的数值和判定结果来确定从那一路提取的当前帧作为当前位置帧的主音源。判定方法如下:1)当某一路为Speech语音帧,而另一路为Ambient无噪环境音帧或者Noise噪音帧时,确定该路作为当前位置帧的主数据通路;2)当某一路为Ambient无噪环境音帧,而另一路为Noise噪音帧时,确定该路作为当前位置帧
的主数据通路;3)当两路均为同一种类帧时,确定数值最大的通道作为当前位置帧的主数据通路。执行步骤122,主音源中仍然包含少量噪声数据,降噪子模块213根据主数据通路语音帧前后关联的噪音帧获得噪声频谱特性,并对语音帧在频域上对噪声频谱成分进行抑制。执行步骤123,输出语音操作指令。
当输入模块类型为文字输入,则执行步骤130,判断文字输入类型。如果是手写输入,则执行步骤131,手写文字判断子模块215判断手写输入的文字类型,并识别文字和数字。执行步骤132,手写文字纠错子模块216根据从手写文字判断子模块215得到的文字和数字,智能纠正错字,得到相对准确的文字指令,执行步骤133,输出文字操作指令。如果是键盘输入,则执行步骤132,键盘文字确认子模块218确认输入文字并进行智能纠错,得到相对准确的文字指令,执行步骤133输出文字操作指令。
执行步骤140,操作指令解析模块220把得到的语音操作指令或文字操作指令进行解析,并生成控制指令(包括搜索指令、筛选指令、缓存指令、下载指令、存储指令和播放指令等)。执行步骤150,音频文件获取模块230执行控制指令,控制指令优先在本地存储模块231中执行,当本地存储模块231无法执行时。则通过网络模块232下载音频文件。执行步骤160,音频文件播放模块240通过音频输出设备播放音频文件。
实施例二
如图3所示,外壳标号为301,振动采集器标号为302,压力传感器为303,信号处理器标号为304,振动腔标号为305,导线标号为306,电路板标号为307,底座标号为308,信号采集部标号为309。
一种骨传麦克风10,如图1所示,包括外壳301、振动采集器302、压力传感器303、信号处理器304、导线306和电路板307,
外壳301与振动采集器302连接形成一个封闭空间。电路板307设置于封闭空间内外壳301的底部,处理器设置于电路板307上,并与电路板307通过电路连接。压力传感器303设置于封闭空间内电路板307与振动采集器302之间,与外壳301固定连接。压力传感器303与电路板307通过导线306电路连接。外壳301至少部分为弹性材料制成。
所述压力传感器303为向下凸出的弧面。非平面的压力传感器,尤其是具有弧面的压力传感器,对于声源振动的感知更加灵敏,有利于声源的采集。
压力传感器端部312与所述外壳301上设置的连接部连接。所述连接部为凹部,压力传感器端部312与其卡接。优选的,在连接部与压力传感器端部312的连接处涂抹有密封胶,用于提高振动腔305的密闭性,减小或避免了气囊漏气所造成的声音损失。
所述振动采集器302包括信号采集部309和第一连接部310,所述外壳301包括第二连接部311,如图3所示,所述第一连接部310和第二连接部311固定连接且通过密封胶连接为一密封整体。所述固定连接为卡接。所述第一连接部310为凹形部,所述第二连接部311为凸形部;或所述第一连接部310为凸形部,所述第二连接部311为凹形部。所述凹部与凸部卡接。
所述振动采集器302为弹性材料制成。所述信号采集部309由向上凸起的多个凸起组成。所述凸起连接为一整体。所述凸起为薄壁形的弧面。所述凸起分布在所述振动采集器302表面。所述振动采集器302至少与所述压力传感器303之间形成密闭空腔。所述空腔为振动腔305。
在本实施例中还包括底座308,所述底座308与所述外壳301一体连接。所述电路板307设置与所述底座308上。所述信号处理器304设置于所述电路板307上。所述压力传感器303通过导
线306与电路板307连接。
实施例三
如图4所示,展示了一个集成了声音采集系统的头戴式耳机400,包括左侧耳机410和右侧耳机430。在左侧耳机410中集中了声音采集系统的核心组成部分,包括标号为420的3G/4G网络,标号为421的wifi/蓝牙,标号为422的LCD显示/触摸屏,标号为423的加速传感器/陀螺仪,标号为424的GPS,标号为425的骨传导麦克风(左),标号为426的喇叭(左),标号为427的音频信号处理(DAC),标号为428的本地数据存储和标号为429的CPU。3G/4G网络、wifi/蓝牙、LCD显示/触摸屏、加速传感器/陀螺仪、GPS、音频信号处理(DAC)和本地数据存储分别于CPU相连接,骨传导麦克风(左)和喇叭(左)则与音频信号处理(DAC)相连接。
右侧耳机430中集中了一些辅助组成部分,包括标号为440的喇叭(右),标号为441和443的传感器,标号为442的触控板音乐控制,标号为444的骨传导麦克风(右)和标号为445的电池。喇叭(右)、传感器、触控板音乐控制和电池分别于左侧耳机中的CPU相连接,骨传导麦克风(右)与喇叭(右)相连接。
实施例四
如图5所示,执行步骤500,导入主音频数据。执行步骤510,调取存储器中存储的环境判定数据。执行步骤520,把主音频数据与环境判定数据进行比对,并确定主音频输入时周边的噪音环境。顺序执行步骤530和步骤540,从存储器中调取环境噪音数据,并与主音频数据进行单帧比对。执行步骤550,去掉主音频数据单帧中与环境噪音数据相同的音频数据。执行步骤560,生成有效的不带有噪音的音频数据。
实施例五
音频播放系统中还包括方言识别模块,用于识别通过音频输入模块采集到的方言。
如图6所示,顺序执行步骤600和步骤610,启动方言识别模块初始化流程并按照提示输入相应的语音。执行步骤620,根据输入的语音,通过网络模块连接到云服务器,查看是否保存在已有的方言库中。如果已经保存在已有的方言库中,则执行步骤630,调取并下载该方言库。执行步骤640,按照提示输入相应的语音,与下载到本地存储器的方言库进行比对纠错,按照自己的习惯对方言库进行微调。执行步骤650,保存在本地存储器中。
如果在已有的方言库中没有该方言,则执行步骤621,通过音频输入模块输入语音,并通过手写输入模块或者键盘输入模块输入对应的词语。执行步骤622,完成全部常用语校对输入后,保存在本地存储器中。执行步骤623上传到云服务器的方言库中。
实施例六
如图7所示,执行步骤600,通过语音输入模块输入语音。执行步骤610,判断该语音对应的方言是否保存在本地存储器中。如果保存在本地存储器中,则顺序执行步骤620和步骤650,调取本地存储器中的方言库并进行方言比对。执行步骤660,根据方言比对结果生成控制指令。
如果本地存储器中没有保存该种方言,则执行步骤630,在云服务器中进行方言检索比对,确定适合的方言库。顺序执行步骤640和步骤650,通过网络模块下载相应的方言库并进行方言比对。执行步骤660,根据方言比对结果生成控制指令。
为了更好地理解本发明,以上结合本发明的具体实施例做了详细描述,但并非是对本发明的限制。凡是依据本发明的技术实质对以上实施例所做的任何简单修改,均仍属于本发明技术方案的范围。本说明书中每个实施例重点说明的都是与其它实施例的不同之处,各个实施例之间相同或相似
的部分相互参见即可。对于系统实施例而言,由于其与方法实施例基本对应,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
可能以许多方式来实现本发明的方法、装置和系统。例如,可通过软件、硬件、固件或者软件、硬件、固件的任何组合来实现本发明的方法和系统。用于所述方法的步骤的上述顺序仅是为了进行说明,本发明的方法的步骤不限于以上具体描述的顺序,除非以其它方式特别说明。此外,在一些实施例中,还可将本发明实施为记录在记录介质中的程序,这些程序包括用于实现根据本发明的方法的机器可读指令。因而,本发明还覆盖存储用于执行根据本发明的方法的程序的记录介质。
本发明的描述是为了示例和描述起见而给出的,而并不是无遗漏的或者将本发明限于所公开的形式。很多修改和变化对于本领域的普通技术人员而言是显然的。选择和描述实施例是为了更好说明本发明的原理和实际应用,并且使本领域的普通技术人员能够理解本发明从而设计适于特定用途的带有各种修改的各种实施例。
Claims (43)
- 一种音频播放方法,包括开启输入模块,其特征在于,还包括以下步骤:步骤1:通过所述输入模块获取操作指令;步骤2:解析所述操作指令,并生成控制指令;步骤3:执行所述控制指令,获取音频文件;步骤4:播放所述音频文件。
- 如权利要求1所述的方法,其特征在于:所述输入模块包括音频输入模块、文字输入模块和手势输入模块中至少一种;所述音频输入模块接受音频指令信号后生成有效音频指令信号;所述文字输入模块生成文字指令信号;所述手势输入模块生成手势指令信号。
- 如权利要求2所述的方法,其特征在于:所述音频输入模块包括至少一个骨传导麦克风和至少一个麦克风。
- 如权利要求3所述的方法,其特征在于:所述音频指令信号包括第一音频信号和第二音频信号。
- 如权利要求4所述的方法,其特征在于:所述第一音频信号是指利用所述骨传导麦克风采集由于用户身体的震动产生的机械波。
- 如权利要求5所述的方法,其特征在于:所述第二音频信号是指利用所述麦克风采集所述机械波生成的时间范围内的声波。
- 如权利要求6所述的方法,其特征在于:通过所述音频输入模块获取所述操作指令的方法包括以下子步骤:步骤11:对采集到的所述音频指令信号进行音频特性检测;步骤12:进行主音源判定;步骤13:消除噪声;步骤14:输出所述有效音频指令信号。
- 如权利要求7所述的方法,其特征在于:所述音频特性检测包括语音检测、噪音检测和相关性特征提取中至少一种。
- 如权利要求11所述的方法,其特征在于:所述步骤12为根据主音源判定原则确定主数据通路。
- 如权利要求13所述的方法,其特征在于:所述步骤13为根据所述主数据通路Speech音频帧前后关联的Noise噪音帧获得噪声频谱特性,并对Speech音频帧在频域上对噪声频谱成分进行抑制。
- 如权利要求14所述的方法,其特征在于:所述操作指令包括所述有效音频指令信号、所述文字指令信号或所述手势指令信号中的至少一种。
- 如权利要求1所述的方法,其特征在于:所述控制指令包括搜索指令、筛选指令、缓存指令、下载指令、存储指令和播放指令中至少一种。
- 如权利要求16所述的方法,其特征在于:所述搜索指令是指优先在本地存储器中进行搜索,若没有则通过通信组件在云端进行搜索。
- 权利要求17所述的方法,其特征在于:所述通信组件包括wifi、无线、2G/3G/4G/5G和GPRS中至少一种。
- 如权利要求18所述的方法,其特征在于:所述获取音频文件是指执行所述缓存指令或下载指令,通过所述通信组件从云端得到音频文件。
- 如权利要求19所述的方法,其特征在于:所述播放指令是指通过放音设备播放缓存音频文件或本地存储器中的音频文件。
- 一种声音采集系统,包括输入模块,其特征在于,还包括以下模块:操作指令获取模块:通过所述输入模块获取操作指令;操作指令解析模块:解析所述操作指令,并生成控制指令;音频文件获取模块:用于执行所述控制指令,获取音频文件;音频文件播放模块:用于把所述音频文件的有效音频数据推送给终端设备。
- 如权利要求21所述的声音采集系统,其特征在于:所述输入模块包括音频输入模块、文字输入模块和手势输入模块中至少一种;所述音频输入模块接受音频指令信号后生成有效音频指令信号;所述文字输入模块生成文字指令信号;所述手势输入模块生成手势指令信号。
- 如权利要求22所述的声音采集系统,其特征在于:所述音频输入模块包括至少一个骨传导麦克风和至少一个麦克风。
- 如权利要求23所述的声音采集系统,其特征在于:所述音频信号包括第一音频信号和第二音频信号。
- 如权利要求24所述的声音采集系统,其特征在于:所述第一音频信号是指利用所述骨传导麦克风采集由于用户身体的震动产生的机械波。
- 如权利要求25所述的声音采集系统,其特征在于:所述第二音频信号是指利用所述麦克风采集所述机械波生成的时间范围内的声波。
- 如权利要求26所述的声音采集系统,其特征在于:所述操作指令获取模块还包括以下子模块:音频特性检测子模块:用于对采集到的所述音频信号进行音频特性检测;主音源判定子模块:用于进行主音源判定;降噪子模块:用于消除噪声;音频指令输出子模块:用于输出所述有效音频指令信号。
- 如权利要求27所述的声音采集系统,其特征在于:所述音频特性检测包括语音检测、噪音检测和相关性特征提取中至少一种。
- 如权利要求31所述的声音采集系统,其特征在于:所述主音源判定子模块用于根据主音源判定原则确定主数据通路。
- 如权利要求33所述的声音采集系统,其特征在于:所述降噪子模块用于根据所述主数据通路Speech音频帧前后关联的Noise噪音帧获得噪声频谱特性,并对Speech音频帧在频域上对噪声频谱成分进行抑制。
- 如权利要求34所述的声音采集系统,其特征在于:所述操作指令包括所述有效音频指令信号、所述文字指令信号或所述手势指令信号中的至少一种。
- 如权利要求21所述的声音采集系统,其特征在于:所述控制指令包括搜索指令、筛选指令、缓存指令、下载指令、存储指令和播放指令中至少一种。
- 如权利要求36所述的声音采集系统,其特征在于:所述搜索指令是指优先在本地存储器中进行搜索,若没有则通过通信组件在云端进行搜索。
- 如权利要求37所述的声音采集系统,其特征在于:所述通信组件包括wifi、无线、2G/3G/4G/5G和GPRS中至少一种。
- 如权利要求38所述的声音采集系统,其特征在于:所述获取音频文件是指执行所述缓存指令或下载指令,通过所述通信组件从云端得到音频文件。
- 如权利要求39所述的声音采集系统,其特征在于:所述播放指令是指通过放音设备播放缓存音频文件或本地存储器中的音频文件。
- 一种声音采集装置,包括外壳,其特征在于,还包括如权利要求21-40中任一所述的系统。
- 如权利要求41所述的声音采集装置,其特征在于:所述声音采集装置固定安装在智能设备上。
- 如权利要求42所述的声音采集装置,其特征在于:所述智能设 备包括:智能手机、智能相机、智能耳机和其他智能设备中至少一种。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201780001736.2A CN108475512B (zh) | 2016-11-03 | 2017-06-20 | 一种音频播放方法、系统和装置 |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
IBPCT/IB2016/001579 | 2016-11-03 | ||
PCT/IB2016/001579 WO2018083511A1 (zh) | 2016-11-03 | 2016-11-03 | 一种音频播放装置及方法 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2018082315A1 true WO2018082315A1 (zh) | 2018-05-11 |
Family
ID=62075847
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/IB2016/001579 WO2018083511A1 (zh) | 2016-11-03 | 2016-11-03 | 一种音频播放装置及方法 |
PCT/CN2017/089207 WO2018082315A1 (zh) | 2016-11-03 | 2017-06-20 | 一种音频播放方法、系统和装置 |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/IB2016/001579 WO2018083511A1 (zh) | 2016-11-03 | 2016-11-03 | 一种音频播放装置及方法 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN108475512B (zh) |
WO (2) | WO2018083511A1 (zh) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113628622A (zh) * | 2021-08-24 | 2021-11-09 | 北京达佳互联信息技术有限公司 | 语音交互方法、装置、电子设备及存储介质 |
CN116318493B (zh) * | 2023-03-21 | 2023-10-24 | 四川贝能达交通设备有限公司 | 一种应急广播控制装置 |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN2448030Y (zh) * | 2000-08-23 | 2001-09-12 | 吴惠琪 | 喉震式免持听筒麦克风装置 |
EP1638084A1 (en) * | 2004-09-17 | 2006-03-22 | Microsoft Corporation | Method and apparatus for multi-sensory speech enhancement |
EP2458586A1 (en) * | 2010-11-24 | 2012-05-30 | Koninklijke Philips Electronics N.V. | System and method for producing an audio signal |
CN103208291A (zh) * | 2013-03-08 | 2013-07-17 | 华南理工大学 | 一种可用于强噪声环境的语音增强方法及装置 |
CN105097001A (zh) * | 2014-05-13 | 2015-11-25 | 北京奇虎科技有限公司 | 音频播放方法和装置 |
US20160302003A1 (en) * | 2015-04-08 | 2016-10-13 | Cornell University | Sensing non-speech body sounds |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101304619A (zh) * | 2007-05-11 | 2008-11-12 | 鸿富锦精密工业(深圳)有限公司 | 无线耳机和音频设备以及音频播放方法 |
FR2974655B1 (fr) * | 2011-04-26 | 2013-12-20 | Parrot | Combine audio micro/casque comprenant des moyens de debruitage d'un signal de parole proche, notamment pour un systeme de telephonie "mains libres". |
WO2014017679A1 (ko) * | 2012-07-26 | 2014-01-30 | Bang Choon Hee | 귀걸이길이조절이 가능한 귀걸이용 걸이이어폰 및 길이조절매듭부 |
TWM445353U (zh) * | 2012-08-16 | 2013-01-21 | Sound Team Entpr Co Ltd | 附加耳機之毛線帽 |
US9043211B2 (en) * | 2013-05-09 | 2015-05-26 | Dsp Group Ltd. | Low power activation of a voice activated device |
US20150199950A1 (en) * | 2014-01-13 | 2015-07-16 | DSP Group | Use of microphones with vsensors for wearable devices |
CN104618831A (zh) * | 2015-01-27 | 2015-05-13 | 深圳市百泰实业有限公司 | 无线智能耳机 |
-
2016
- 2016-11-03 WO PCT/IB2016/001579 patent/WO2018083511A1/zh active Application Filing
-
2017
- 2017-06-20 WO PCT/CN2017/089207 patent/WO2018082315A1/zh active Application Filing
- 2017-06-20 CN CN201780001736.2A patent/CN108475512B/zh active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN2448030Y (zh) * | 2000-08-23 | 2001-09-12 | 吴惠琪 | 喉震式免持听筒麦克风装置 |
EP1638084A1 (en) * | 2004-09-17 | 2006-03-22 | Microsoft Corporation | Method and apparatus for multi-sensory speech enhancement |
EP2458586A1 (en) * | 2010-11-24 | 2012-05-30 | Koninklijke Philips Electronics N.V. | System and method for producing an audio signal |
CN103208291A (zh) * | 2013-03-08 | 2013-07-17 | 华南理工大学 | 一种可用于强噪声环境的语音增强方法及装置 |
CN105097001A (zh) * | 2014-05-13 | 2015-11-25 | 北京奇虎科技有限公司 | 音频播放方法和装置 |
US20160302003A1 (en) * | 2015-04-08 | 2016-10-13 | Cornell University | Sensing non-speech body sounds |
Also Published As
Publication number | Publication date |
---|---|
CN108475512A (zh) | 2018-08-31 |
WO2018083511A1 (zh) | 2018-05-11 |
CN108475512B (zh) | 2023-06-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11676581B2 (en) | Method and apparatus for evaluating trigger phrase enrollment | |
US10964300B2 (en) | Audio signal processing method and apparatus, and storage medium thereof | |
CN103095911B (zh) | 一种通过语音唤醒寻找手机的方法及系统 | |
CN108346425B (zh) | 一种语音活动检测的方法和装置、语音识别的方法和装置 | |
US9275638B2 (en) | Method and apparatus for training a voice recognition model database | |
US9570076B2 (en) | Method and system for voice recognition employing multiple voice-recognition techniques | |
US10687142B2 (en) | Method for input operation control and related products | |
CN110322760B (zh) | 语音数据生成方法、装置、终端及存储介质 | |
KR20140144233A (ko) | 성문 특징 모델 갱신 방법 및 단말 | |
WO2020155490A1 (zh) | 基于语音分析的管理音乐的方法、装置和计算机设备 | |
CN105580071B (zh) | 用于训练声音识别模型数据库的方法和装置 | |
WO2017154282A1 (ja) | 音声処理装置および音声処理方法 | |
US10783903B2 (en) | Sound collection apparatus, sound collection method, recording medium recording sound collection program, and dictation method | |
CN110620970A (zh) | 一种耳机触控方法、装置、无线耳机及tws耳机 | |
US11437022B2 (en) | Performing speaker change detection and speaker recognition on a trigger phrase | |
CN110097895B (zh) | 一种纯音乐检测方法、装置及存储介质 | |
US11348584B2 (en) | Method for voice recognition via earphone and earphone | |
US20180144740A1 (en) | Methods and systems for locating the end of the keyword in voice sensing | |
WO2018082315A1 (zh) | 一种音频播放方法、系统和装置 | |
JP2020160431A (ja) | 音声認識装置、音声認識方法及びそのプログラム | |
CN107977187B (zh) | 一种混响调节方法及电子设备 | |
CN112997144A (zh) | 一种录音方法、装置、电子设备和计算机可读存储介质 | |
CN114093357A (zh) | 控制方法、智能终端及可读存储介质 | |
KR20130116128A (ko) | 티티에스를 이용한 음성인식 질의응답 시스템 및 그것의 운영방법 | |
CN111739493A (zh) | 音频处理方法、装置及存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 17866611 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 17866611 Country of ref document: EP Kind code of ref document: A1 |