WO2017124876A1 - 一种语音播放方法和装置 - Google Patents

一种语音播放方法和装置 Download PDF

Info

Publication number
WO2017124876A1
WO2017124876A1 PCT/CN2016/111636 CN2016111636W WO2017124876A1 WO 2017124876 A1 WO2017124876 A1 WO 2017124876A1 CN 2016111636 W CN2016111636 W CN 2016111636W WO 2017124876 A1 WO2017124876 A1 WO 2017124876A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio stream
training
playing
impulse
streaming media
Prior art date
Application number
PCT/CN2016/111636
Other languages
English (en)
French (fr)
Inventor
曾戟
Original Assignee
曾戟
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 曾戟 filed Critical 曾戟
Publication of WO2017124876A1 publication Critical patent/WO2017124876A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/065Adaptation
    • G10L15/07Adaptation to the speaker
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/04Training, enrolment or model building

Definitions

  • the invention belongs to the field of speech recognition technology, and in particular relates to a voice playing method and device.
  • Voiceprint recognition is a recognition technique that uses human voice. Because there are certain differences in the vocal organs used by people when speaking, the voiceprints of any two people's voices are different, so voiceprints can be used to characterize individual differences. The biometrics, that is, the different characteristics of the individuals can be characterized by establishing a voiceprint feature model, and then the voiceprint feature model is used to identify different individuals. At present, there is a dilemma in the application of the voiceprint feature model, which is mainly reflected in the length selection of the training corpus. Generally speaking, the longer the voiceprint training corpus, the more accurate the established feature model, the higher the recognition accuracy, but the practicability. Not strong; the voiceprint training corpus is shorter, which can ensure better practicability, but the recognition accuracy is not high. In practical applications, such as the application of the voice screen unlocking of the mobile phone screen, it is required to have a higher recognition accuracy rate to meet the security, and the training corpus is not required to be too long to ensure good practicability.
  • the existing voiceprint feature model is established by the user manually training multiple times during the voiceprint registration phase, each training is a phrase material, and finally they are combined into a longer training corpus to generate a feature model.
  • each training is a phrase material
  • they are combined into a longer training corpus to generate a feature model.
  • manually inputting a certain length of training corpus manually by the user will give the user a poor experience and does not have high practicability; the combined training corpus is still limited in length, and cannot generate a more accurate feature model, and the recognition accuracy is accurate.
  • Can not be further improved; changes in the tone of speech, mood swings, etc. will also affect the accuracy of the model establishment. Therefore, how to improve the accuracy of the voiceprint feature model and improve the recognition accuracy is an urgent problem to be solved under the premise of ensuring high practicability.
  • a voice playing method comprising:
  • the trained audio stream is loaded in a streaming media file for playback.
  • the method further includes:
  • the training the original audio stream according to a preset training algorithm includes:
  • the loading the trained audio stream is loaded in a streaming media file for performing Playback includes:
  • the impulse audio stream is loaded into the streaming media file as a trained audio stream for playback.
  • the method further includes:
  • a voice playback device comprising:
  • An obtaining module configured to acquire an original audio stream including at least one speaker
  • a training module configured to train the original audio stream according to a preset training algorithm
  • a playing module configured to load the trained audio stream in a streaming media file for playing.
  • the apparatus further includes:
  • a module is created for building a training sample library.
  • the training module includes:
  • a dividing unit configured to segment the original audio stream to obtain an analog audio stream and a real audio stream
  • An impulse unit for pulsing an analog audio stream and a real audio stream, according to the impulse audio stream is a real audio stream.
  • the playing module includes:
  • a determining unit configured to determine whether the impulse audio stream finds a matching sample object in the training sample library
  • the playing unit is configured to, when matched, load the impulse audio stream as a trained audio stream in the streaming media file for playing.
  • the apparatus further includes:
  • test module configured to test whether the streaming media file is distorted.
  • a voice playing method comprising: acquiring an original audio stream including at least one speaker; training the original audio stream according to a preset training algorithm; loading the trained audio stream in a streaming media file In order to play. In this way, audio data with higher accuracy and less distortion can be played.
  • FIG. 1 is a flow chart of a method of a voice playing method of the present invention.
  • FIG. 2 is a block diagram of a module of a voice playback device of the present invention.
  • Embodiment 1 is a diagrammatic representation of Embodiment 1:
  • a voice playing method the method includes:
  • S101 Acquire an original audio stream that includes at least one speaker
  • the method further includes:
  • the training the original audio stream according to the preset training algorithm includes:
  • the loading the trained audio stream in the streaming media file for playing includes:
  • the impulse audio stream is loaded into the streaming media file as a trained audio stream for playback.
  • the method further includes:
  • a voice playback device comprising:
  • An obtaining module 201 configured to acquire an original audio stream that includes at least one speaker
  • the training module 202 is configured to train the original audio stream according to a preset training algorithm.
  • the playing module 203 is configured to load the trained audio stream into a streaming media file for playing.
  • the device further includes:
  • a module is created for building a training sample library.
  • the training module includes:
  • a dividing unit configured to segment the original audio stream to obtain an analog audio stream and a real audio stream
  • An impulse unit for pulsing an analog audio stream and a real audio stream, according to the impulse audio stream is a real audio stream.
  • the playing module includes:
  • a determining unit configured to determine whether the impulse audio stream finds a matching sample object in the training sample library
  • the playing unit is configured to, when matched, load the impulse audio stream as a trained audio stream in the streaming media file for playing.
  • the device further includes:
  • test module configured to test whether the streaming media file is distorted.
  • the device may be a terminal device including a mobile phone, a tablet computer, a PDA (Personal Digital Assistant), a POS (Point of Sales), a vehicle-mounted computer, and the like.
  • a terminal device including a mobile phone, a tablet computer, a PDA (Personal Digital Assistant), a POS (Point of Sales), a vehicle-mounted computer, and the like.
  • RF Radio Frequency
  • the RF circuit can be used for receiving and transmitting signals during the transmission or reception of information or during a call.
  • the processor after receiving the downlink information of the base station, it is processed by the processor; in addition, the uplink data is designed to be sent to the base station.
  • RF circuits include, but are not limited to, an antenna, at least one amplifier, a transceiver, a coupler, an LNA (Low Noise Amplifier), a duplexer, and the like.
  • the RF circuit can communicate with the network and other devices through wireless communication.
  • the wireless communication may use any communication standard or protocol, including but not limited to GSM (Global System of Mobile communication), GPRS (General Packet Radio Service), CDMA (Code Division Multiple Access). , Code Division Multiple Access), WCDMA (Wideband Code Division Multiple Access), LTE (Long Term Evolution), e-mail, SMS (Short Messaging Service), and the like.
  • the memory can be used to store software programs and modules, and the processor executes various functional applications and data processing of the mobile phone by running software programs and modules stored in the memory.
  • the memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application required for at least one function (such as a sound playing function, an image playing function, etc.), and the storage data area may be stored according to the mobile phone. Use the created data (such as audio data, phone book, etc.).
  • the memory may include a high speed random access memory, and may also include a nonvolatile memory such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.
  • the input unit can be used to receive input numeric or character information and to generate key signal inputs related to user settings and function controls of the handset.
  • the input unit may include a touch panel and other input devices.
  • a touch panel also referred to as a touch screen, can collect touch operations on or near the user (such as the user using a finger, a stylus, or the like, any suitable object or accessory on or near the touch panel).
  • the corresponding connecting device is driven according to a preset program.
  • the touch panel may include two parts: a touch detection device and a touch controller.
  • the touch detection device detects the touch orientation of the user, and detects a signal brought by the touch operation, and transmits the signal to the touch controller; the touch controller receives the touch information from the touch detection device, converts the touch information into contact coordinates, and sends the touch information.
  • touch panels can be implemented in various types such as resistive, capacitive, infrared, and surface acoustic waves.
  • the input unit may also include other input devices. Specifically, other input devices may include, but are not limited to, a physical keyboard, function keys (such as a volume control button, a switch button, etc.), a trackball, and a mouse. One or more of the target, the operating lever, and the like.
  • the display unit can be used to display information input by the user or information provided to the user as well as various menus of the mobile phone.
  • the display unit may include a display panel.
  • the display panel may be configured in the form of an LCD (Liquid Crystal Display), an OLED (Organic Light-Emitting Diode), or the like.
  • the touch panel may cover the display panel, and when the touch panel detects a touch operation on or near the touch panel, the touch panel transmits to the processor to determine the type of the touch event, and then the processor is on the display panel according to the type of the touch event. Provide the corresponding visual output.
  • the handset may also include at least one type of sensor, such as a light sensor, motion sensor, and other sensors.
  • the light sensor may include an ambient light sensor and a proximity sensor, wherein the ambient light sensor can adjust the brightness of the display panel according to the brightness of the ambient light, and the proximity sensor can close the display panel and/or the backlight when the mobile phone moves to the ear.
  • the accelerometer sensor can detect the magnitude of acceleration in all directions (usually three axes). When it is stationary, it can detect the magnitude and direction of gravity.
  • the mobile phone can be used to identify the gesture of the mobile phone (such as horizontal and vertical screen switching, related Game, magnetometer attitude calibration), vibration recognition related functions (such as pedometer, tapping), etc.; as for the mobile phone can also be configured with gyroscopes, barometers, hygrometers, thermometers, infrared sensors and other sensors, no longer Narration.
  • the gesture of the mobile phone such as horizontal and vertical screen switching, related Game, magnetometer attitude calibration
  • vibration recognition related functions such as pedometer, tapping
  • the mobile phone can also be configured with gyroscopes, barometers, hygrometers, thermometers, infrared sensors and other sensors, no longer Narration.
  • Audio circuits, speakers, and microphones provide an audio interface between the user and the phone.
  • the audio circuit can transmit the converted electrical signal of the received audio data to the speaker and convert it into a sound signal output by the speaker; on the other hand, the microphone converts the collected sound signal into an electrical signal, which is received by the audio circuit and converted into audio.
  • the data is then processed by the audio data output processor, sent via an RF circuit to, for example, another handset, or the audio data is output to a memory for further processing.
  • WiFi is a short-range wireless transmission technology.
  • the mobile phone can help users to send and receive emails, browse web pages and access streaming media through the WiFi module. It provides users with wireless broadband Internet access.
  • the mobile phone also includes a power source (such as a battery) that supplies power to various components.
  • a power source such as a battery
  • the power source can be connected to the processor logic through the power management system to manage functions such as charging, discharging, and power management through the power management system.
  • the mobile phone may further include a camera, a Bluetooth module, and the like, and details are not described herein again.

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Telephone Function (AREA)

Abstract

一种语音播放方法和装置,其中该方法包括:获取包含至少一个说话者的原始音频流(S101);根据预设训练算法,对该原始音频流进行训练(S102);将训练后的音频流加载在流媒体文件中,以进行播放(S103)。以此,可以播放出精确度较高、失真度较小的音频数据。

Description

一种语音播放方法和装置 技术领域
本发明属于语音识别技术领域,尤其涉及一种语音播放方法和装置。
背景技术
声纹识别是一种利用人的声音实现的识别技术,由于人在讲话时使用的发声器官存在一定的差异性,任何两个人声音的声纹图谱都有差异,所以声纹可以作为表征个体差异的生物特征,也即可以通过建立声纹特征模型来表征不同的个体,进而利用该声纹特征模型进行识别不同的个体等。目前声纹特征模型的应用存在一个两难的选择,主要体现在训练语料的长度选取上,一般而言,声纹训练语料越长,建立的特征模型越精确,识别准确率越高,但是实用性不强;声纹训练语料较短,能保证较好的实用性,但识别准确率不高。而在实际应用中,比如手机屏幕声纹解锁的应用等,既要求有较高的识别准确率,以满足安全性,又要求训练语料不能太长,以保证较好的实用性。
现有的声纹特征模型建立方法是通过在声纹注册阶段,由用户手动多次训练,每次训练均为短语料,最终将他们组合成较长的训练语料来生成特征模型。然而,由用户手动多次录入一定时长的训练语料,会给用户较差的体验,不具备较高的实用性;组合起来的训练语料长度仍然有限,不能生成较精确的特征模型,识别准确率无法进一步提升;语速语调的变化、情绪波动等也都会影响模型建立的精确度。所以,如何在保证较高的实用性前提下,提高声纹特征模型精确度,进而提高识别准确率是急需解决的问题。
发明内容
基于此,为了解决上述问题,提供了一种语音播放方法和装置。
一种语音播放方法,所述方法包括:
获取包含至少一个说话者的原始音频流;
根据预设训练算法,对所述原始音频流进行训练;
将所述训练后的音频流加载在流媒体文件中,以进行播放。
在其中一个实施例中,所述方法还包括:
建立训练样本库。
在其中一个实施例中,所述根据预设训练算法,对所述原始音频流进行训练包括:
对原始音频流进行分割,得到模拟音频流和真实音频流;
对模拟音频流和真实音频流进行冲激,根据冲激音频流。
在其中一个实施例中,所述将所述训练后的音频流加载在流媒体文件中,以进行 播放包括:
判断所述冲激音频流在所述训练样本库中是否找到匹配的样本对象;
若匹配,则将该冲激音频流作为训练后的音频流加载在流媒体文件中,以进行播放。
在其中一个实施例中,所述方法还包括:
测试所述流媒体文件是否失真。
一种语音播放装置,所述装置包括:
获取模块,用于获取包含至少一个说话者的原始音频流;
训练模块,用于根据预设训练算法,对所述原始音频流进行训练;
播放模块,用于将所述训练后的音频流加载在流媒体文件中,以进行播放。
在其中一个实施例中,所述装置还包括:
建立模块,用于建立训练样本库。
在其中一个实施例中,所述训练模块包括:
分割单元,用于对原始音频流进行分割,得到模拟音频流和真实音频流;
冲激单元,用于对模拟音频流和真实音频流进行冲激,根据冲激音频流。
在其中一个实施例中,所述播放模块包括:
判断单元,用于判断所述冲激音频流在所述训练样本库中是否找到匹配的样本对象;
播放单元,用于当匹配,则将该冲激音频流作为训练后的音频流加载在流媒体文件中,以进行播放。
在其中一个实施例中,所述装置还包括:
测试模块,用于测试所述流媒体文件是否失真。
有益效果:
一种语音播放方法,所述方法包括:获取包含至少一个说话者的原始音频流;根据预设训练算法,对所述原始音频流进行训练;将所述训练后的音频流加载在流媒体文件中,以进行播放。以此,可以播放出精确度较高、失真度较小的音频数据。
附图说明
图1是本发明一种语音播放方法的方法流程图。
图2是本发明一种语音播放装置的模块框图。
具体实施方式
为了使本发明的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本发明进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本发明,并不用于限定本发明。
以下结合具体实施例对本发明的具体实现进行详细描述:
实施例一:
如图1所示,一种语音播放方法,所述方法包括:
S101:获取包含至少一个说话者的原始音频流;
S102:根据预设训练算法,对所述原始音频流进行训练;
S103:将所述训练后的音频流加载在流媒体文件中,以进行播放。
在本实施例中,所述方法还包括:
建立训练样本库。
在本实施例中,所述根据预设训练算法,对所述原始音频流进行训练包括:
对原始音频流进行分割,得到模拟音频流和真实音频流;
对模拟音频流和真实音频流进行冲激,根据冲激音频流。
在本实施例中,所述将所述训练后的音频流加载在流媒体文件中,以进行播放包括:
判断所述冲激音频流在所述训练样本库中是否找到匹配的样本对象;
若匹配,则将该冲激音频流作为训练后的音频流加载在流媒体文件中,以进行播放。
在本实施例中,所述方法还包括:
测试所述流媒体文件是否失真。
实施例2
一种语音播放装置,所述装置包括:
获取模块201,用于获取包含至少一个说话者的原始音频流;
训练模块202,用于根据预设训练算法,对所述原始音频流进行训练;
播放模块203,用于将所述训练后的音频流加载在流媒体文件中,以进行播放。
在本实施例中,所述装置还包括:
建立模块,用于建立训练样本库。
在本实施例中,所述训练模块包括:
分割单元,用于对原始音频流进行分割,得到模拟音频流和真实音频流;
冲激单元,用于对模拟音频流和真实音频流进行冲激,根据冲激音频流。
在本实施例中,所述播放模块包括:
判断单元,用于判断所述冲激音频流在所述训练样本库中是否找到匹配的样本对象;
播放单元,用于当匹配,则将该冲激音频流作为训练后的音频流加载在流媒体文件中,以进行播放。
在本实施例中,所述装置还包括:
测试模块,用于测试所述流媒体文件是否失真。
需要说明的是,该装置可以为包括手机、平板电脑、PDA(Personal Digital Assistant,个人数字助理)、POS(Point of Sales,销售终端)、车载电脑等终端设备,以终端为手机为例,手机包括RF(Radio Frequency,射频)电路、存储器、输入单元、显示单元、传感器、音频电路、WiFi(wireless fidelity,无线保真)模块、处理器、以及电源等部件。本RF电路可用于收发信息或通话过程中,信号的接收和发送,特别地,将基站的下行信息接收后,给处理器处理;另外,将设计上行的数据发送给基站。通常,RF电路包括但不限于天线、至少一个放大器、收发信机、耦合器、LNA(Low Noise Amplifier,低噪声放大器)、双工器等。此外,RF电路还可以通过无线通信与网络和其他设备通信。所述无线通信可以使用任一通信标准或协议,包括但不限于GSM(Global System of Mobile communication,全球移动通讯系统)、GPRS(General Packet Radio Service,通用分组无线服务)、CDMA(Code Division Multiple Access,码分多址)、WCDMA(Wideband Code Division Multiple Access,宽带码分多址)、LTE(Long Term Evolution,长期演进)、电子邮件、SMS(Short Messaging Service,短消息服务)等。
存储器可用于存储软件程序以及模块,处理器通过运行存储在存储器的软件程序以及模块,从而执行手机的各种功能应用以及数据处理。存储器可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序(比如声音播放功能、图像播放功能等)等;存储数据区可存储根据手机的使用所创建的数据(比如音频数据、电话本等)等。此外,存储器可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他易失性固态存储器件。
输入单元可用于接收输入的数字或字符信息,以及产生与手机的用户设置以及功能控制有关的键信号输入。具体地,输入单元可包括触控面板以及其他输入设备。触控面板,也称为触摸屏,可收集用户在其上或附近的触摸操作(比如用户使用手指、触笔等任何适合的物体或附件在触控面板上或在触控面板附近的操作),并根据预先设定的程式驱动相应的连接装置。可选的,触控面板可包括触摸检测装置和触摸控制器两个部分。其中,触摸检测装置检测用户的触摸方位,并检测触摸操作带来的信号,将信号传送给触摸控制器;触摸控制器从触摸检测装置上接收触摸信息,并将它转换成触点坐标,再送给处理器,并能接收处理器发来的命令并加以执行。此外,可以采用电阻式、电容式、红外线以及表面声波等多种类型实现触控面板。除了触控面板,输入单元还可以包括其他输入设备。具体地,其他输入设备可以包括但不限于物理键盘、功能键(比如音量控制按键、开关按键等)、轨迹球、鼠 标、操作杆等中的一种或多种。
显示单元可用于显示由用户输入的信息或提供给用户的信息以及手机的各种菜单。显示单元可包括显示面板,可选的,可以采用LCD(Liquid Crystal Display,液晶显示器)、OLED(Organic Light-Emitting Diode,有机发光二极管)等形式来配置显示面板。进一步的,触控面板可覆盖显示面板,当触控面板检测到在其上或附近的触摸操作后,传送给处理器以确定触摸事件的类型,随后处理器根据触摸事件的类型在显示面板上提供相应的视觉输出。
手机还可包括至少一种传感器,比如光传感器、运动传感器以及其他传感器。具体地,光传感器可包括环境光传感器及接近传感器,其中,环境光传感器可根据环境光线的明暗来调节显示面板的亮度,接近传感器可在手机移动到耳边时,关闭显示面板和/或背光。作为运动传感器的一种,加速计传感器可检测各个方向上(一般为三轴)加速度的大小,静止时可检测出重力的大小及方向,可用于识别手机姿态的应用(比如横竖屏切换、相关游戏、磁力计姿态校准)、振动识别相关功能(比如计步器、敲击)等;至于手机还可配置的陀螺仪、气压计、湿度计、温度计、红外线传感器等其他传感器,在此不再赘述。
音频电路、扬声器,传声器可提供用户与手机之间的音频接口。音频电路可将接收到的音频数据转换后的电信号,传输到扬声器,由扬声器转换为声音信号输出;另一方面,传声器将收集的声音信号转换为电信号,由音频电路接收后转换为音频数据,再将音频数据输出处理器处理后,经RF电路以发送给比如另一手机,或者将音频数据输出至存储器以便进一步处理。
WiFi属于短距离无线传输技术,手机通过WiFi模块可以帮助用户收发电子邮件、浏览网页和访问流式媒体等,它为用户提供了无线的宽带互联网访问。
手机还包括给各个部件供电的电源(比如电池),优选的,电源可以通过电源管理系统与处理器逻辑相连,从而通过电源管理系统实现管理充电、放电、以及功耗管理等功能。
尽管未示出,手机还可以包括摄像头、蓝牙模块等,在此不再赘述。
需要说明的是,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本发明的范围。
以上所述的具体实施方式,对本发明的目的、技术方案和有益效果进行了进一步详细说明,所应理解的是,以上所述仅为本发明的具体实施方式而已,并不用于限定本发明的保护范围,凡在本发明的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含 在本发明的保护范围之内。

Claims (10)

  1. 一种语音播放方法,其特征在于,所述方法包括:
    获取包含至少一个说话者的原始音频流;
    根据预设训练算法,对所述原始音频流进行训练;
    将所述训练后的音频流加载在流媒体文件中,以进行播放。
  2. 如权利要求1所述的方法,其特征在于,所述方法还包括:
    建立训练样本库。
  3. 如权利要求1或2所述的方法,其特征在于,所述根据预设训练算法,对所述原始音频流进行训练包括:
    对原始音频流进行分割,得到模拟音频流和真实音频流;
    对模拟音频流和真实音频流进行冲激,根据冲激音频流。
  4. 如权利要求3中所述的方法,其特征在于,所述将所述训练后的音频流加载在流媒体文件中,以进行播放包括:
    判断所述冲激音频流在所述训练样本库中是否找到匹配的样本对象;
    若匹配,则将该冲激音频流作为训练后的音频流加载在流媒体文件中,以进行播放。
  5. 如权利要求1所述的方法,其特征在于,所述方法还包括:
    测试所述流媒体文件是否失真。
  6. 一种语音播放装置,其特征在于,所述装置包括:
    获取模块,用于获取包含至少一个说话者的原始音频流;
    训练模块,用于根据预设训练算法,对所述原始音频流进行训练;
    播放模块,用于将所述训练后的音频流加载在流媒体文件中,以进行播放。
  7. 如权利要求6所述的装置,其特征在于,所述装置还包括:
    建立模块,用于建立训练样本库。
  8. 如权利要求7所述的装置,其特征在于,所述训练模块包括:
    分割单元,用于对原始音频流进行分割,得到模拟音频流和真实音频流;
    冲激单元,用于对模拟音频流和真实音频流进行冲激,根据冲激音频流。
  9. 如权利要求8所述的装置,其特征在于,所述播放模块包括:
    判断单元,用于判断所述冲激音频流在所述训练样本库中是否找到匹配的样本对象;
    播放单元,用于当匹配,则将该冲激音频流作为训练后的音频流加载在流媒体文件中,以进行播放。
  10. 如权利要求6所述的装置,其特征在于,所述装置还包括:
    测试模块,用于测试所述流媒体文件是否失真。
PCT/CN2016/111636 2016-01-20 2016-12-23 一种语音播放方法和装置 WO2017124876A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610036759.4A CN105632489A (zh) 2016-01-20 2016-01-20 一种语音播放方法和装置
CN201610036759.4 2016-01-20

Publications (1)

Publication Number Publication Date
WO2017124876A1 true WO2017124876A1 (zh) 2017-07-27

Family

ID=56047336

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/111636 WO2017124876A1 (zh) 2016-01-20 2016-12-23 一种语音播放方法和装置

Country Status (2)

Country Link
CN (1) CN105632489A (zh)
WO (1) WO2017124876A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105632489A (zh) * 2016-01-20 2016-06-01 曾戟 一种语音播放方法和装置
CN113571054B (zh) * 2020-04-28 2023-08-15 中国移动通信集团浙江有限公司 语音识别信号预处理方法、装置、设备及计算机存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101197131A (zh) * 2006-12-07 2008-06-11 积体数位股份有限公司 随机式声纹密码验证系统、随机式声纹密码锁及其产生方法
CN101321387A (zh) * 2008-07-10 2008-12-10 中国移动通信集团广东有限公司 基于通信系统的声纹识别方法及系统
CN102446505A (zh) * 2010-10-15 2012-05-09 盛乐信息技术(上海)有限公司 联合因子分析方法及联合因子分析声纹认证方法
CN102760434A (zh) * 2012-07-09 2012-10-31 华为终端有限公司 一种声纹特征模型更新方法及终端
CN102781075A (zh) * 2011-05-12 2012-11-14 中兴通讯股份有限公司 一种降低移动终端通话功耗的方法及移动终端
CN105632489A (zh) * 2016-01-20 2016-06-01 曾戟 一种语音播放方法和装置

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101923857A (zh) * 2009-06-17 2010-12-22 复旦大学 一种人机交互的可扩展语音识别方法
CN104078051B (zh) * 2013-03-29 2018-09-25 南京中兴软件有限责任公司 一种人声提取方法、系统以及人声音频播放方法及装置
US9665570B2 (en) * 2013-10-11 2017-05-30 International Business Machines Corporation Computer-based analysis of virtual discussions for products and services
KR20150145024A (ko) * 2014-06-18 2015-12-29 한국전자통신연구원 화자적응 음성인식 시스템의 단말 및 서버와 그 운용 방법
CN105096941B (zh) * 2015-09-02 2017-10-31 百度在线网络技术(北京)有限公司 语音识别方法以及装置

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101197131A (zh) * 2006-12-07 2008-06-11 积体数位股份有限公司 随机式声纹密码验证系统、随机式声纹密码锁及其产生方法
CN101321387A (zh) * 2008-07-10 2008-12-10 中国移动通信集团广东有限公司 基于通信系统的声纹识别方法及系统
CN102446505A (zh) * 2010-10-15 2012-05-09 盛乐信息技术(上海)有限公司 联合因子分析方法及联合因子分析声纹认证方法
CN102781075A (zh) * 2011-05-12 2012-11-14 中兴通讯股份有限公司 一种降低移动终端通话功耗的方法及移动终端
CN102760434A (zh) * 2012-07-09 2012-10-31 华为终端有限公司 一种声纹特征模型更新方法及终端
CN105632489A (zh) * 2016-01-20 2016-06-01 曾戟 一种语音播放方法和装置

Also Published As

Publication number Publication date
CN105632489A (zh) 2016-06-01

Similar Documents

Publication Publication Date Title
US9685161B2 (en) Method for updating voiceprint feature model and terminal
US10964300B2 (en) Audio signal processing method and apparatus, and storage medium thereof
US11355157B2 (en) Special effect synchronization method and apparatus, and mobile terminal
WO2017215649A1 (zh) 音效调节方法及用户终端
WO2020029906A1 (zh) 一种多人语音的分离方法和装置
CN108538320B (zh) 录音控制方法和装置、可读存储介质、终端
US9275638B2 (en) Method and apparatus for training a voice recognition model database
WO2021012900A1 (zh) 控制振动的方法、装置、移动终端及计算机可读存储介质
WO2017215660A1 (zh) 一种场景音效的控制方法、及电子设备
CN108511002B (zh) 危险事件声音信号识别方法、终端和计算机可读存储介质
CN104581221A (zh) 视频直播的方法和装置
CN107886969B (zh) 一种音频播放方法及音频播放装置
WO2017215635A1 (zh) 一种音效处理方法及移动终端
CN110097895B (zh) 一种纯音乐检测方法、装置及存储介质
CN106506437B (zh) 一种音频数据处理方法,及设备
WO2019042049A1 (zh) 图片处理方法及移动终端
WO2017215511A1 (zh) 一种场景音效的控制方法、及相关产品
WO2021098676A1 (zh) 控制方法和电子设备
CN108763475B (zh) 一种录制方法、录制装置及终端设备
CN110830368A (zh) 即时通讯消息发送方法及电子设备
CN110378677B (zh) 一种红包领取方法、装置、移动终端及存储介质
WO2017124876A1 (zh) 一种语音播放方法和装置
CN108632465A (zh) 一种语音输入的方法及移动终端
WO2020118560A1 (zh) 一种录音方法、装置、电子设备和计算机可读存储介质
CN108418961B (zh) 一种音频播放方法和移动终端

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16886136

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16886136

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS (EPO FORM 1205A DATED 29..01.2019)

122 Ep: pct application non-entry in european phase

Ref document number: 16886136

Country of ref document: EP

Kind code of ref document: A1