WO2017124876A1 - 一种语音播放方法和装置 - Google Patents
一种语音播放方法和装置 Download PDFInfo
- Publication number
- WO2017124876A1 WO2017124876A1 PCT/CN2016/111636 CN2016111636W WO2017124876A1 WO 2017124876 A1 WO2017124876 A1 WO 2017124876A1 CN 2016111636 W CN2016111636 W CN 2016111636W WO 2017124876 A1 WO2017124876 A1 WO 2017124876A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- audio stream
- training
- playing
- impulse
- streaming media
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 28
- 238000012549 training Methods 0.000 claims abstract description 45
- 238000012360 testing method Methods 0.000 claims description 9
- 230000006870 function Effects 0.000 description 10
- 238000004891 communication Methods 0.000 description 3
- 238000001514 detection method Methods 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 230000005236 sound signal Effects 0.000 description 2
- 206010027951 Mood swings Diseases 0.000 description 1
- 230000001133 acceleration Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000007599 discharging Methods 0.000 description 1
- 230000005484 gravity Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 210000000056 organ Anatomy 0.000 description 1
- 238000010079 rubber tapping Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000010897 surface acoustic wave method Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
- 230000001755 vocal effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/04—Training, enrolment or model building
Definitions
- the invention belongs to the field of speech recognition technology, and in particular relates to a voice playing method and device.
- Voiceprint recognition is a recognition technique that uses human voice. Because there are certain differences in the vocal organs used by people when speaking, the voiceprints of any two people's voices are different, so voiceprints can be used to characterize individual differences. The biometrics, that is, the different characteristics of the individuals can be characterized by establishing a voiceprint feature model, and then the voiceprint feature model is used to identify different individuals. At present, there is a dilemma in the application of the voiceprint feature model, which is mainly reflected in the length selection of the training corpus. Generally speaking, the longer the voiceprint training corpus, the more accurate the established feature model, the higher the recognition accuracy, but the practicability. Not strong; the voiceprint training corpus is shorter, which can ensure better practicability, but the recognition accuracy is not high. In practical applications, such as the application of the voice screen unlocking of the mobile phone screen, it is required to have a higher recognition accuracy rate to meet the security, and the training corpus is not required to be too long to ensure good practicability.
- the existing voiceprint feature model is established by the user manually training multiple times during the voiceprint registration phase, each training is a phrase material, and finally they are combined into a longer training corpus to generate a feature model.
- each training is a phrase material
- they are combined into a longer training corpus to generate a feature model.
- manually inputting a certain length of training corpus manually by the user will give the user a poor experience and does not have high practicability; the combined training corpus is still limited in length, and cannot generate a more accurate feature model, and the recognition accuracy is accurate.
- Can not be further improved; changes in the tone of speech, mood swings, etc. will also affect the accuracy of the model establishment. Therefore, how to improve the accuracy of the voiceprint feature model and improve the recognition accuracy is an urgent problem to be solved under the premise of ensuring high practicability.
- a voice playing method comprising:
- the trained audio stream is loaded in a streaming media file for playback.
- the method further includes:
- the training the original audio stream according to a preset training algorithm includes:
- the loading the trained audio stream is loaded in a streaming media file for performing Playback includes:
- the impulse audio stream is loaded into the streaming media file as a trained audio stream for playback.
- the method further includes:
- a voice playback device comprising:
- An obtaining module configured to acquire an original audio stream including at least one speaker
- a training module configured to train the original audio stream according to a preset training algorithm
- a playing module configured to load the trained audio stream in a streaming media file for playing.
- the apparatus further includes:
- a module is created for building a training sample library.
- the training module includes:
- a dividing unit configured to segment the original audio stream to obtain an analog audio stream and a real audio stream
- An impulse unit for pulsing an analog audio stream and a real audio stream, according to the impulse audio stream is a real audio stream.
- the playing module includes:
- a determining unit configured to determine whether the impulse audio stream finds a matching sample object in the training sample library
- the playing unit is configured to, when matched, load the impulse audio stream as a trained audio stream in the streaming media file for playing.
- the apparatus further includes:
- test module configured to test whether the streaming media file is distorted.
- a voice playing method comprising: acquiring an original audio stream including at least one speaker; training the original audio stream according to a preset training algorithm; loading the trained audio stream in a streaming media file In order to play. In this way, audio data with higher accuracy and less distortion can be played.
- FIG. 1 is a flow chart of a method of a voice playing method of the present invention.
- FIG. 2 is a block diagram of a module of a voice playback device of the present invention.
- Embodiment 1 is a diagrammatic representation of Embodiment 1:
- a voice playing method the method includes:
- S101 Acquire an original audio stream that includes at least one speaker
- the method further includes:
- the training the original audio stream according to the preset training algorithm includes:
- the loading the trained audio stream in the streaming media file for playing includes:
- the impulse audio stream is loaded into the streaming media file as a trained audio stream for playback.
- the method further includes:
- a voice playback device comprising:
- An obtaining module 201 configured to acquire an original audio stream that includes at least one speaker
- the training module 202 is configured to train the original audio stream according to a preset training algorithm.
- the playing module 203 is configured to load the trained audio stream into a streaming media file for playing.
- the device further includes:
- a module is created for building a training sample library.
- the training module includes:
- a dividing unit configured to segment the original audio stream to obtain an analog audio stream and a real audio stream
- An impulse unit for pulsing an analog audio stream and a real audio stream, according to the impulse audio stream is a real audio stream.
- the playing module includes:
- a determining unit configured to determine whether the impulse audio stream finds a matching sample object in the training sample library
- the playing unit is configured to, when matched, load the impulse audio stream as a trained audio stream in the streaming media file for playing.
- the device further includes:
- test module configured to test whether the streaming media file is distorted.
- the device may be a terminal device including a mobile phone, a tablet computer, a PDA (Personal Digital Assistant), a POS (Point of Sales), a vehicle-mounted computer, and the like.
- a terminal device including a mobile phone, a tablet computer, a PDA (Personal Digital Assistant), a POS (Point of Sales), a vehicle-mounted computer, and the like.
- RF Radio Frequency
- the RF circuit can be used for receiving and transmitting signals during the transmission or reception of information or during a call.
- the processor after receiving the downlink information of the base station, it is processed by the processor; in addition, the uplink data is designed to be sent to the base station.
- RF circuits include, but are not limited to, an antenna, at least one amplifier, a transceiver, a coupler, an LNA (Low Noise Amplifier), a duplexer, and the like.
- the RF circuit can communicate with the network and other devices through wireless communication.
- the wireless communication may use any communication standard or protocol, including but not limited to GSM (Global System of Mobile communication), GPRS (General Packet Radio Service), CDMA (Code Division Multiple Access). , Code Division Multiple Access), WCDMA (Wideband Code Division Multiple Access), LTE (Long Term Evolution), e-mail, SMS (Short Messaging Service), and the like.
- the memory can be used to store software programs and modules, and the processor executes various functional applications and data processing of the mobile phone by running software programs and modules stored in the memory.
- the memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application required for at least one function (such as a sound playing function, an image playing function, etc.), and the storage data area may be stored according to the mobile phone. Use the created data (such as audio data, phone book, etc.).
- the memory may include a high speed random access memory, and may also include a nonvolatile memory such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.
- the input unit can be used to receive input numeric or character information and to generate key signal inputs related to user settings and function controls of the handset.
- the input unit may include a touch panel and other input devices.
- a touch panel also referred to as a touch screen, can collect touch operations on or near the user (such as the user using a finger, a stylus, or the like, any suitable object or accessory on or near the touch panel).
- the corresponding connecting device is driven according to a preset program.
- the touch panel may include two parts: a touch detection device and a touch controller.
- the touch detection device detects the touch orientation of the user, and detects a signal brought by the touch operation, and transmits the signal to the touch controller; the touch controller receives the touch information from the touch detection device, converts the touch information into contact coordinates, and sends the touch information.
- touch panels can be implemented in various types such as resistive, capacitive, infrared, and surface acoustic waves.
- the input unit may also include other input devices. Specifically, other input devices may include, but are not limited to, a physical keyboard, function keys (such as a volume control button, a switch button, etc.), a trackball, and a mouse. One or more of the target, the operating lever, and the like.
- the display unit can be used to display information input by the user or information provided to the user as well as various menus of the mobile phone.
- the display unit may include a display panel.
- the display panel may be configured in the form of an LCD (Liquid Crystal Display), an OLED (Organic Light-Emitting Diode), or the like.
- the touch panel may cover the display panel, and when the touch panel detects a touch operation on or near the touch panel, the touch panel transmits to the processor to determine the type of the touch event, and then the processor is on the display panel according to the type of the touch event. Provide the corresponding visual output.
- the handset may also include at least one type of sensor, such as a light sensor, motion sensor, and other sensors.
- the light sensor may include an ambient light sensor and a proximity sensor, wherein the ambient light sensor can adjust the brightness of the display panel according to the brightness of the ambient light, and the proximity sensor can close the display panel and/or the backlight when the mobile phone moves to the ear.
- the accelerometer sensor can detect the magnitude of acceleration in all directions (usually three axes). When it is stationary, it can detect the magnitude and direction of gravity.
- the mobile phone can be used to identify the gesture of the mobile phone (such as horizontal and vertical screen switching, related Game, magnetometer attitude calibration), vibration recognition related functions (such as pedometer, tapping), etc.; as for the mobile phone can also be configured with gyroscopes, barometers, hygrometers, thermometers, infrared sensors and other sensors, no longer Narration.
- the gesture of the mobile phone such as horizontal and vertical screen switching, related Game, magnetometer attitude calibration
- vibration recognition related functions such as pedometer, tapping
- the mobile phone can also be configured with gyroscopes, barometers, hygrometers, thermometers, infrared sensors and other sensors, no longer Narration.
- Audio circuits, speakers, and microphones provide an audio interface between the user and the phone.
- the audio circuit can transmit the converted electrical signal of the received audio data to the speaker and convert it into a sound signal output by the speaker; on the other hand, the microphone converts the collected sound signal into an electrical signal, which is received by the audio circuit and converted into audio.
- the data is then processed by the audio data output processor, sent via an RF circuit to, for example, another handset, or the audio data is output to a memory for further processing.
- WiFi is a short-range wireless transmission technology.
- the mobile phone can help users to send and receive emails, browse web pages and access streaming media through the WiFi module. It provides users with wireless broadband Internet access.
- the mobile phone also includes a power source (such as a battery) that supplies power to various components.
- a power source such as a battery
- the power source can be connected to the processor logic through the power management system to manage functions such as charging, discharging, and power management through the power management system.
- the mobile phone may further include a camera, a Bluetooth module, and the like, and details are not described herein again.
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Telephone Function (AREA)
Abstract
Description
Claims (10)
- 一种语音播放方法,其特征在于,所述方法包括:获取包含至少一个说话者的原始音频流;根据预设训练算法,对所述原始音频流进行训练;将所述训练后的音频流加载在流媒体文件中,以进行播放。
- 如权利要求1所述的方法,其特征在于,所述方法还包括:建立训练样本库。
- 如权利要求1或2所述的方法,其特征在于,所述根据预设训练算法,对所述原始音频流进行训练包括:对原始音频流进行分割,得到模拟音频流和真实音频流;对模拟音频流和真实音频流进行冲激,根据冲激音频流。
- 如权利要求3中所述的方法,其特征在于,所述将所述训练后的音频流加载在流媒体文件中,以进行播放包括:判断所述冲激音频流在所述训练样本库中是否找到匹配的样本对象;若匹配,则将该冲激音频流作为训练后的音频流加载在流媒体文件中,以进行播放。
- 如权利要求1所述的方法,其特征在于,所述方法还包括:测试所述流媒体文件是否失真。
- 一种语音播放装置,其特征在于,所述装置包括:获取模块,用于获取包含至少一个说话者的原始音频流;训练模块,用于根据预设训练算法,对所述原始音频流进行训练;播放模块,用于将所述训练后的音频流加载在流媒体文件中,以进行播放。
- 如权利要求6所述的装置,其特征在于,所述装置还包括:建立模块,用于建立训练样本库。
- 如权利要求7所述的装置,其特征在于,所述训练模块包括:分割单元,用于对原始音频流进行分割,得到模拟音频流和真实音频流;冲激单元,用于对模拟音频流和真实音频流进行冲激,根据冲激音频流。
- 如权利要求8所述的装置,其特征在于,所述播放模块包括:判断单元,用于判断所述冲激音频流在所述训练样本库中是否找到匹配的样本对象;播放单元,用于当匹配,则将该冲激音频流作为训练后的音频流加载在流媒体文件中,以进行播放。
- 如权利要求6所述的装置,其特征在于,所述装置还包括:测试模块,用于测试所述流媒体文件是否失真。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610036759.4A CN105632489A (zh) | 2016-01-20 | 2016-01-20 | 一种语音播放方法和装置 |
CN201610036759.4 | 2016-01-20 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2017124876A1 true WO2017124876A1 (zh) | 2017-07-27 |
Family
ID=56047336
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2016/111636 WO2017124876A1 (zh) | 2016-01-20 | 2016-12-23 | 一种语音播放方法和装置 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN105632489A (zh) |
WO (1) | WO2017124876A1 (zh) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105632489A (zh) * | 2016-01-20 | 2016-06-01 | 曾戟 | 一种语音播放方法和装置 |
CN113571054B (zh) * | 2020-04-28 | 2023-08-15 | 中国移动通信集团浙江有限公司 | 语音识别信号预处理方法、装置、设备及计算机存储介质 |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101197131A (zh) * | 2006-12-07 | 2008-06-11 | 积体数位股份有限公司 | 随机式声纹密码验证系统、随机式声纹密码锁及其产生方法 |
CN101321387A (zh) * | 2008-07-10 | 2008-12-10 | 中国移动通信集团广东有限公司 | 基于通信系统的声纹识别方法及系统 |
CN102446505A (zh) * | 2010-10-15 | 2012-05-09 | 盛乐信息技术(上海)有限公司 | 联合因子分析方法及联合因子分析声纹认证方法 |
CN102760434A (zh) * | 2012-07-09 | 2012-10-31 | 华为终端有限公司 | 一种声纹特征模型更新方法及终端 |
CN102781075A (zh) * | 2011-05-12 | 2012-11-14 | 中兴通讯股份有限公司 | 一种降低移动终端通话功耗的方法及移动终端 |
CN105632489A (zh) * | 2016-01-20 | 2016-06-01 | 曾戟 | 一种语音播放方法和装置 |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101923857A (zh) * | 2009-06-17 | 2010-12-22 | 复旦大学 | 一种人机交互的可扩展语音识别方法 |
CN104078051B (zh) * | 2013-03-29 | 2018-09-25 | 南京中兴软件有限责任公司 | 一种人声提取方法、系统以及人声音频播放方法及装置 |
US9665570B2 (en) * | 2013-10-11 | 2017-05-30 | International Business Machines Corporation | Computer-based analysis of virtual discussions for products and services |
KR20150145024A (ko) * | 2014-06-18 | 2015-12-29 | 한국전자통신연구원 | 화자적응 음성인식 시스템의 단말 및 서버와 그 운용 방법 |
CN105096941B (zh) * | 2015-09-02 | 2017-10-31 | 百度在线网络技术(北京)有限公司 | 语音识别方法以及装置 |
-
2016
- 2016-01-20 CN CN201610036759.4A patent/CN105632489A/zh active Pending
- 2016-12-23 WO PCT/CN2016/111636 patent/WO2017124876A1/zh active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101197131A (zh) * | 2006-12-07 | 2008-06-11 | 积体数位股份有限公司 | 随机式声纹密码验证系统、随机式声纹密码锁及其产生方法 |
CN101321387A (zh) * | 2008-07-10 | 2008-12-10 | 中国移动通信集团广东有限公司 | 基于通信系统的声纹识别方法及系统 |
CN102446505A (zh) * | 2010-10-15 | 2012-05-09 | 盛乐信息技术(上海)有限公司 | 联合因子分析方法及联合因子分析声纹认证方法 |
CN102781075A (zh) * | 2011-05-12 | 2012-11-14 | 中兴通讯股份有限公司 | 一种降低移动终端通话功耗的方法及移动终端 |
CN102760434A (zh) * | 2012-07-09 | 2012-10-31 | 华为终端有限公司 | 一种声纹特征模型更新方法及终端 |
CN105632489A (zh) * | 2016-01-20 | 2016-06-01 | 曾戟 | 一种语音播放方法和装置 |
Also Published As
Publication number | Publication date |
---|---|
CN105632489A (zh) | 2016-06-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9685161B2 (en) | Method for updating voiceprint feature model and terminal | |
US10964300B2 (en) | Audio signal processing method and apparatus, and storage medium thereof | |
US11355157B2 (en) | Special effect synchronization method and apparatus, and mobile terminal | |
WO2017215649A1 (zh) | 音效调节方法及用户终端 | |
WO2020029906A1 (zh) | 一种多人语音的分离方法和装置 | |
CN108538320B (zh) | 录音控制方法和装置、可读存储介质、终端 | |
US9275638B2 (en) | Method and apparatus for training a voice recognition model database | |
WO2021012900A1 (zh) | 控制振动的方法、装置、移动终端及计算机可读存储介质 | |
WO2017215660A1 (zh) | 一种场景音效的控制方法、及电子设备 | |
CN108511002B (zh) | 危险事件声音信号识别方法、终端和计算机可读存储介质 | |
CN104581221A (zh) | 视频直播的方法和装置 | |
CN107886969B (zh) | 一种音频播放方法及音频播放装置 | |
WO2017215635A1 (zh) | 一种音效处理方法及移动终端 | |
CN110097895B (zh) | 一种纯音乐检测方法、装置及存储介质 | |
CN106506437B (zh) | 一种音频数据处理方法,及设备 | |
WO2019042049A1 (zh) | 图片处理方法及移动终端 | |
WO2017215511A1 (zh) | 一种场景音效的控制方法、及相关产品 | |
WO2021098676A1 (zh) | 控制方法和电子设备 | |
CN108763475B (zh) | 一种录制方法、录制装置及终端设备 | |
CN110830368A (zh) | 即时通讯消息发送方法及电子设备 | |
CN110378677B (zh) | 一种红包领取方法、装置、移动终端及存储介质 | |
WO2017124876A1 (zh) | 一种语音播放方法和装置 | |
CN108632465A (zh) | 一种语音输入的方法及移动终端 | |
WO2020118560A1 (zh) | 一种录音方法、装置、电子设备和计算机可读存储介质 | |
CN108418961B (zh) | 一种音频播放方法和移动终端 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 16886136 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 16886136 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS (EPO FORM 1205A DATED 29..01.2019) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 16886136 Country of ref document: EP Kind code of ref document: A1 |