WO2017206256A1 - Method for automatically adjusting speaking speed and terminal - Google Patents

Method for automatically adjusting speaking speed and terminal Download PDF

Info

Publication number
WO2017206256A1
WO2017206256A1 PCT/CN2016/087741 CN2016087741W WO2017206256A1 WO 2017206256 A1 WO2017206256 A1 WO 2017206256A1 CN 2016087741 W CN2016087741 W CN 2016087741W WO 2017206256 A1 WO2017206256 A1 WO 2017206256A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
voice
speed
voice information
feature
Prior art date
Application number
PCT/CN2016/087741
Other languages
French (fr)
Chinese (zh)
Inventor
王晓军
Original Assignee
宇龙计算机通信科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 宇龙计算机通信科技(深圳)有限公司 filed Critical 宇龙计算机通信科技(深圳)有限公司
Publication of WO2017206256A1 publication Critical patent/WO2017206256A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/72Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
    • H04M1/724User interfaces specially adapted for cordless or mobile telephones
    • H04M1/72403User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality
    • H04M1/7243User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality with interactive means for internal management of messages
    • H04M1/72433User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality with interactive means for internal management of messages for voice messaging, e.g. dictaphones

Definitions

  • Querying, by the voice database, the playing speed of the voice information corresponding to the voice feature information including:
  • a playback speed of the voice information corresponding to the voice feature information and the body information is queried from a voice database.
  • the digital signal of the voice information is resampled by interpolation or cropping, and the time scale of the voice information is adjusted to reach the playing speed.
  • a voice feature extraction module configured to extract voice feature information of the voice information
  • a playing speed determining module configured to query, from the voice database, a playing speed of the voice information corresponding to the voice feature information
  • the second voice feature extraction unit is configured to extract at least one of the speech rate information, the feature word information, and the audio information of the voice information.
  • the voice information is voice information of the local user, and the terminal further includes:
  • the machine learning module is configured to update the correspondence between the playback speeds in the voice database according to the machine learning algorithm by using the voice feature information and the physical information.
  • the core of the present invention is to provide a method and terminal for automatically adjusting the speech rate, and can determine a predetermined playing speed corresponding to the voice feature information according to the voice feature information of the voice information input in real time, and input the sound according to the playing speed.
  • the speech rate of the voice information is adjusted, and the playback speed is adaptively adjusted according to the content of the voice information.
  • the voice information is adjusted according to the obtained playback speed to achieve the playback speed.
  • the method for adjusting the specific voice information is not limited here, as long as the acquired voice information can be adjusted to the corresponding playback speed for playback.
  • a specific speech rate adjustment process is provided below: the digital signal of the speech information is resampled by interpolation or scintillation, and the time scale of the speech information is adjusted to reach the playback speed. That is, the digital signal is resampled by interpolation or scribing, thereby lengthening or shortening the time scale of the speech, and achieving the purpose of changing the speech rate.
  • the learning and updating of the voice database is performed by using a machine learning algorithm.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Acoustics & Sound (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Business, Economics & Management (AREA)
  • General Business, Economics & Management (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Telephonic Communication Services (AREA)
  • Telephone Function (AREA)

Abstract

Disclosed in the present invention is a method for automatically adjusting speaking speed, comprising: acquiring inputted speech information; extracting the speech characteristic information of the speech information; querying from a speech database the playing speed of the speech information corresponding to the speech characteristic information; adjusting, according to the playing speed, the speed at which the speech information is played. It can be seen that the method can determine, according to the speech characteristic information of the speech information inputted in real time, a predetermined playing speed corresponding to the speech characteristic information, and adjust, according to the playing speed, the speaking speed of the inputted speech information, to accommodate the needs of various users; i.e. the present invention realizes the adaptive adjustment of the playing speed according to the content of the speech information, and can be used for a call, programmed playing, etc. with good adaptability. Further disclosed in the present invention is a terminal, which can adaptively adjust the playing speed according to the content of speech information.

Description

一种语速自动调节的方法及终端Method and terminal for automatically adjusting speech rate 技术领域Technical field
本发明涉及通信技术领域,特别涉及一种语速自动调节的方法及终端。The present invention relates to the field of communications technologies, and in particular, to a method and terminal for automatically adjusting speech rate.
背景技术Background technique
由于人们的听力水平的不同,同样语速的播放内容对一部分人来说会觉得语速很快以至于听不清楚,对另一部分人来说会觉得语速很慢以至于觉得在浪费时间。因此,终端中的播放内容的语速需要根据人们的实际需求进行设定。Due to the different levels of people's hearing, the content of the same speed of speech will make some people feel that the speed of speech is so fast that they can't hear clearly. For others, they will feel that the speed of speech is so slow that they feel that they are wasting time. Therefore, the speech rate of the content to be played in the terminal needs to be set according to the actual needs of the people.
现有技术中,在用户手机客户端应用程序增加语速调节控件,使得用户选择调节语速,选定语速等级,手机按照用户设定调节语速等级播放语音内容。但是上述方法也存在一下缺点:首先,语速的调节虽然分为几个等级,但是需要人手动预设,不能动态的调节即不能够自适应的对语速进行调节。其次,语速调节仅限于手机客户端软件播放的内容,不能在通话时实时的调节语速。最后,不能自适应其他种类语言,根据通话双方的语种进行语速调节。因此,如何自适应对语速进行调节,是本领域技术人员需要解决的技术问题。In the prior art, the speech speed adjustment control is added in the user mobile phone client application, so that the user selects the adjustment speech rate, selects the speech rate level, and the mobile phone plays the speech content according to the user-set adjustment speech rate level. However, the above methods also have the following disadvantages: First, although the adjustment of the speech rate is divided into several levels, it requires manual presets, and cannot be dynamically adjusted, that is, the speech rate cannot be adaptively adjusted. Secondly, the speech rate adjustment is limited to the content played by the mobile client software, and the speech rate cannot be adjusted in real time during the call. Finally, it is not possible to adapt to other kinds of languages and adjust the speech rate according to the language of both parties. Therefore, how to adaptively adjust the speech rate is a technical problem that a person skilled in the art needs to solve.
发明内容Summary of the invention
本发明的目的是提供一种语速自动调节的方法及终端,能够根据实时输入的语音信息的语音特征信息,确定与该语音特征信息相对应的预定的播放速度,根据该播放速度对输入的语音信息的语速进行调节,实现了根据语音信息的内容自适应的调节播放速度。An object of the present invention is to provide a method and terminal for automatically adjusting speech rate, which can determine a predetermined playing speed corresponding to the voice feature information according to the voice feature information of the voice information input in real time, and input the sound according to the playing speed. The speech rate of the voice information is adjusted, and the playback speed is adaptively adjusted according to the content of the voice information.
为解决上述技术问题,本发明提供一种语速自动调节的方法,包括:In order to solve the above technical problem, the present invention provides a method for automatically adjusting speech rate, including:
获取输入的语音信息;Obtain the input voice information;
提取所述语音信息的语音特征信息; Extracting voice feature information of the voice information;
从语音数据库中查询与所述语音特征信息相对应的所述语音信息的播放速度;Querying, from the voice database, a play speed of the voice information corresponding to the voice feature information;
根据所述播放速度调节所述语音信息播放的速度。Adjusting the speed at which the voice information is played according to the playback speed.
其中,所述提取所述语音信息的语音特征信息,包括:The extracting the voice feature information of the voice information includes:
识别所述语音信息的语种特征信息;和/或,Identifying language feature information of the voice message; and/or,
提取所述语音信息的语速信息,特征词信息及音频信息中至少一种。Extracting at least one of speech rate information, feature word information, and audio information of the voice information.
其中,所述语音信息为本端用户的语音信息,该方法还包括:The voice information is voice information of the local user, and the method further includes:
获取所述本端用户的体征信息;Obtaining the physical location information of the local user;
从语音数据库中查询与所述语音特征信息相对应的所述语音信息的播放速度,包括:Querying, by the voice database, the playing speed of the voice information corresponding to the voice feature information, including:
从语音数据库中查询与所述语音特征信息及所述体征信息相对应的所述语音信息的播放速度。A playback speed of the voice information corresponding to the voice feature information and the body information is queried from a voice database.
其中,将从语音数据库中查询与所述语音特征信息及所述体征信息相对应的所述语音信息的播放速度,还包括:The playing speed of the voice information corresponding to the voice feature information and the body information is queried from the voice database, and further includes:
利用所述语音特征信息及所述体征信息,根据机器学习算法对语音数据库中播放速度的对应关系进行更新。The voice relationship information and the physical condition information are used to update the correspondence relationship between the playback speeds in the voice database according to the machine learning algorithm.
其中,根据所述播放速度调节所述语音信息播放的速度,包括:The speed at which the voice information is played is adjusted according to the playing speed, including:
通过插值或者抽剪对所述语音信息的数字信号重新采样,调节所述语音信息的时间尺度达到所述播放速度。The digital signal of the voice information is resampled by interpolation or cropping, and the time scale of the voice information is adjusted to reach the playing speed.
本发明还提供一种终端,包括:The invention also provides a terminal, comprising:
语音信息获取模块,用于获取输入的语音信息;a voice information acquiring module, configured to obtain input voice information;
语音特征提取模块,用于提取所述语音信息的语音特征信息;a voice feature extraction module, configured to extract voice feature information of the voice information;
播放速度确定模块,用于从语音数据库中查询与所述语音特征信息相对应的所述语音信息的播放速度;a playing speed determining module, configured to query, from the voice database, a playing speed of the voice information corresponding to the voice feature information;
播放速度调节模块,用于根据所述播放速度调节所述语音信息播放的速度。And a play speed adjustment module, configured to adjust a speed of playing the voice information according to the play speed.
其中,所述语音特征提取模块包括:The voice feature extraction module includes:
第一语音特征提取单元,用于识别所述语音信息的语种特征信息;和/或,a first speech feature extraction unit, configured to identify language feature information of the voice information; and/or,
第二语音特征提取单元,用于提取所述语音信息的语速信息,特征词信息及音频信息中至少一种。 The second voice feature extraction unit is configured to extract at least one of the speech rate information, the feature word information, and the audio information of the voice information.
其中,所述语音信息为本端用户的语音信息,该终端还包括:The voice information is voice information of the local user, and the terminal further includes:
体征信息获取模块,用于获取所述本端用户的体征信息。The physical information acquisition module is configured to obtain the physical location information of the local user.
其中,所述终端还包括:The terminal further includes:
机器学习模块,用于利用所述语音特征信息及所述体征信息,根据机器学习算法对语音数据库中播放速度的对应关系进行更新。The machine learning module is configured to update the correspondence between the playback speeds in the voice database according to the machine learning algorithm by using the voice feature information and the physical information.
其中,所述播放速度调节模块具体为通过插值或者抽剪对所述语音信息的数字信号重新采样,调节所述语音信息的时间尺度达到所述播放速度的模块。The playback speed adjustment module is specifically a module that resamples the digital signal of the voice information by interpolation or clipping, and adjusts a time scale of the voice information to reach the playback speed.
本发明所提供的语速自动调节的方法,包括:获取输入的语音信息;提取所述语音信息的语音特征信息;从语音数据库中查询与所述语音特征信息相对应的所述语音信息的播放速度;根据所述播放速度调节所述语音信息播放的速度;The method for automatically adjusting the speech rate provided by the present invention comprises: acquiring input voice information; extracting voice feature information of the voice information; and querying, from the voice database, the voice information corresponding to the voice feature information Speed; adjusting a speed at which the voice information is played according to the playing speed;
可见该方法能够根据实时输入的语音信息的语音特征信息,确定与该语音特征信息相对应的预定的播放速度,根据该播放速度对输入的语音信息的语速进行调节,以适应各种用户的需求;即实现了根据语音信息的内容自适应的调节播放速度,且该方法可以用于用户通话以及程序播放等场合,提高了该方法的适应性。本发明还提供了一种终端,具有上述有益效果,在此不再赘述。It can be seen that the method can determine a predetermined playing speed corresponding to the voice feature information according to the voice feature information of the voice information input in real time, and adjust the speech rate of the input voice information according to the playing speed to adapt to various users. The requirement is that the playback speed is adaptively adjusted according to the content of the voice information, and the method can be used for occasions such as user call and program play, thereby improving the adaptability of the method. The present invention also provides a terminal, which has the above-mentioned beneficial effects, and details are not described herein again.
附图说明DRAWINGS
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据提供的附图获得其他的附图。In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the description of the prior art will be briefly described below. Obviously, the drawings in the following description are only It is an embodiment of the present invention, and those skilled in the art can obtain other drawings according to the provided drawings without any creative work.
图1为本发明实施例所提供的语速自动调节的方法的流程图;1 is a flowchart of a method for automatically adjusting a speech rate according to an embodiment of the present invention;
图2为本发明实施例所提供的终端的结构框图;2 is a structural block diagram of a terminal according to an embodiment of the present invention;
图3为本发明实施例所提供的另一终端的结构框图;3 is a structural block diagram of another terminal according to an embodiment of the present invention;
图4为本发明实施例所提供的又一终端的结构框图。 FIG. 4 is a structural block diagram of still another terminal according to an embodiment of the present invention.
具体实施方式detailed description
本发明的核心是提供一种语速自动调节的方法及终端,能够根据实时输入的语音信息的语音特征信息,确定与该语音特征信息相对应的预定的播放速度,根据该播放速度对输入的语音信息的语速进行调节,实现了根据语音信息的内容自适应的调节播放速度。The core of the present invention is to provide a method and terminal for automatically adjusting the speech rate, and can determine a predetermined playing speed corresponding to the voice feature information according to the voice feature information of the voice information input in real time, and input the sound according to the playing speed. The speech rate of the voice information is adjusted, and the playback speed is adaptively adjusted according to the content of the voice information.
为使本发明实施例的目的、技术方案和优点更加清楚,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The technical solutions in the embodiments of the present invention will be clearly and completely described in conjunction with the drawings in the embodiments of the present invention. It is a partial embodiment of the invention, and not all of the embodiments. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention.
请参考图1,图1为本发明实施例所提供的语速自动调节的方法的流程图;本实施例中的执行主体为终端,该终端可以是手机;该方法可以包括:Please refer to FIG. 1. FIG. 1 is a flowchart of a method for automatically adjusting a speech rate according to an embodiment of the present invention; the execution subject in the embodiment is a terminal, and the terminal may be a mobile phone; the method may include:
S100、获取输入的语音信息;S100. Acquire input voice information.
其中,这里语音信息的获取可以是对通话业务及能够实现语音播放功能业务的应用程序的监听来实现;即可以是本端用户打电话或者接电话时的语音信息,也可以是对端用户打电话或者接电话时的语音信息,也可以是具有语音播放功能的应用程序播放的语音信息。The voice information can be obtained by monitoring the call service and the application capable of implementing the voice play function; that is, the voice information of the local user when making a call or receiving a call, or the peer user playing The voice information when the phone or the phone is answered may also be the voice information played by the application with the voice playing function.
S110、提取所述语音信息的语音特征信息;S110. Extract voice feature information of the voice information.
其中,这里提取的语音特征信息的种类以及种类的数量都可以根据用户实际需要进行确认,只要可以依据语音信息中具有的该语音特征信息来对应按照预设标准调节获取的语音信息的播放速度即可。即能够通过语音信息中的语音特征信息按照预设标准调节其播放语速实现语速自动调节即可。例如,这里的语音特征信息可以包括情绪、语种、语音特征、语速、语调等特征信息。The type of the voice feature information extracted and the number of the types of the voice information can be confirmed according to the actual needs of the user, as long as the voice information of the voice information in the voice information can be adjusted according to the preset standard. can. That is to say, the speech feature information in the voice information can be adjusted according to a preset standard to realize the automatic adjustment of the speech rate. For example, the voice feature information herein may include feature information such as emotion, language, phonetic features, speech rate, and intonation.
S120、从语音数据库中查询与所述语音特征信息相对应的所述语音信息的播放速度;S120. Query, from a voice database, a playback speed of the voice information corresponding to the voice feature information.
其中,当确认需要进行提取的语音特征信息后,用户可以预先设置对应每种语音特征信息相应的播放速度,或者几种语音特征信息共同确定对应的一个播放速度;这里可以在语音数据库中以对应列表的形式将上述对应关系 进行存储,也可以利用映射表的形式将上述对应关系进行存储。用户还可以根据实际情况的变化对语音数据库中保存的对应关系进行修改、删除、增加等修改,以保证设定的语音特征信息相对应的播放速度为最新的,能够满足用户的实际需求。After confirming the extracted voice feature information, the user may preset a corresponding playback speed corresponding to each voice feature information, or several voice feature information jointly determine a corresponding playback speed; where the voice database may correspond The form of the list will correspond to the above relationship For storage, the above correspondence may be stored in the form of a mapping table. The user can also modify, delete, add, and so on the corresponding relationship saved in the voice database according to the actual situation, so as to ensure that the corresponding playback speed of the voice feature information is the latest, and can meet the actual needs of the user.
这里查询语音数据库,还可以包括将提取的语音特征信息与语音数据库中对应的该类语音特征信息的范围区间进行对比,判断提取的语音特征信息的数值位于哪个范围,进而确认该范围对应的预设播放速度。用户也可以根据实际需求对语音特征信息的范围区间进行修改,也可以对每个范围对应的预设播放速度进行修改,以适应用户的个性化需求,提高用户体验。Querying the voice database here may further include comparing the extracted voice feature information with a range of the corresponding voice feature information in the voice database, determining which range the value of the extracted voice feature information is located, and further confirming the range corresponding to the range. Set the playback speed. The user can also modify the range of the voice feature information according to actual needs, and can also modify the preset playback speed corresponding to each range to adapt to the user's personalized needs and improve the user experience.
S130、根据所述播放速度调节所述语音信息播放的速度。S130. Adjust a speed of playing the voice information according to the playing speed.
其中,根据得到的播放速度对语音信息进行调节,以达到该播放速度。这里并不对具体的语音信息调节的方法进行限定,只要可以将获取的语音信息调节到对应的播放速度进行播放即可。下面提供一种具体的语速调节过程:通过插值或者抽剪对所述语音信息的数字信号重新采样,调节所述语音信息的时间尺度达到所述播放速度。即通过插值或者抽剪对数字信号重新采样,从而拉长或缩短语音的时间尺度,达到改变语速的目的。The voice information is adjusted according to the obtained playback speed to achieve the playback speed. The method for adjusting the specific voice information is not limited here, as long as the acquired voice information can be adjusted to the corresponding playback speed for playback. A specific speech rate adjustment process is provided below: the digital signal of the speech information is resampled by interpolation or scintillation, and the time scale of the speech information is adjusted to reach the playback speed. That is, the digital signal is resampled by interpolation or scribing, thereby lengthening or shortening the time scale of the speech, and achieving the purpose of changing the speech rate.
例如,在人们使用手机的过程中,通话是一个基本业务,也是一个很重要的功能。但是有些人说话语速比较快,有的人听力又不好,这种情况下沟通起来就比较困难。该方法在用户使用手机进行通话的过程中,根据获取的输入语音信息对双方通话时的情绪、语种、语音特征等语音特征信息进行采集并与语音数据库中的信息进行比对,从而进行判断,如果语速过快,或者对端有异常反馈,确认该语速对应的播放速度,或者异常反馈对应的播放速度,并通过插值或者抽剪对数字信号重新采样,从而拉长或缩短语音的时间尺度,达到改变语速的目的。用户使用手机时根据本端用户或对端用户使用手机通话时使用的语言种类、情绪变化等因素,自动的调节从听筒播放出来的声音的速度。以适应于各类人群的需求。For example, in the process of people using mobile phones, calling is a basic business and an important function. However, some people speak faster and some people have poor hearing. In this case, it is more difficult to communicate. In the process of the user using the mobile phone to make a call, the method collects the voice feature information such as emotion, language, and voice features of the two parties according to the obtained input voice information, and compares the information in the voice database with the information in the voice database, thereby performing judgment. If the speech rate is too fast, or there is abnormal feedback at the opposite end, confirm the playback speed corresponding to the speech rate, or the playback speed corresponding to the abnormal feedback, and resample the digital signal by interpolation or clipping to lengthen or shorten the speech time. Scale, the purpose of changing the speed of speech. When the user uses the mobile phone, the speed of the sound played from the earpiece is automatically adjusted according to factors such as the language type and mood change used by the local user or the opposite user when using the mobile phone. To adapt to the needs of various groups of people.
其中,可选的,利用机器学习算法对所述语音数据库进行学习更新。Optionally, the learning and updating of the voice database is performed by using a machine learning algorithm.
在终端中维护语音数据库,可以对用户的语音特征信息参数进行存储,使机器学习算法将语音特征信息参数作为输入进行学习实现对语音数据库的 更新。可以根据不同的用户群体的长期使用习惯进行调节,而不是完全按照指导的原始设定数据来调节,具有更好的适应性。Maintaining the voice database in the terminal, the user can store the voice feature information parameters, so that the machine learning algorithm takes the voice feature information parameters as input and learns to implement the voice database. Update. It can be adjusted according to the long-term usage habits of different user groups, rather than being adjusted according to the original setting data of the guide, and has better adaptability.
上述例子具体实现过程可以如下:The specific implementation process of the above example can be as follows:
本端用户即主叫端用户急于表述某事或情绪激动时,其语音信息内容所用的词句符合数据库中对用户“急躁”这类定义,那么就会按照“急躁”对应的播放速度降低获取的输入语音信息的语速。达到舒缓的目的,使得用户可以更加高效与友好的使用手机通话功能。When the local user, that is, the calling user is eager to express something or is emotional, the words used in the content of the voice information conform to the definition of "impatient" in the database, and then the acquisition speed is reduced according to the "immediacy" corresponding playback speed. Enter the speech rate of the voice message. Achieve soothing purposes, allowing users to use mobile phone calls more efficiently and friendly.
再例如主叫端用户使用英语时,根据语音特征信息判断出这是英语,那么就会按照英语对应的播放速度调节输入语音信息的语速。这样调节之后,被叫端用户即对端用户会听到放慢后的语音信息,即可一定程度解决用户在与非母语用户沟通时听力困难的问题。For example, when the calling end user uses English, it is judged that the English is based on the voice feature information, and then the speech rate of the input voice information is adjusted according to the playing speed corresponding to the English. After this adjustment, the called end user, that is, the opposite end user will hear the slowed voice information, and can solve the problem that the user has difficulty in communicating with the non-native language user to a certain extent.
基于上述技术方案,本发明实施例提的语速自动调节的方法,能够根据实时输入的语音信息的语音特征信息,确定与该语音特征信息相对应的预定的播放速度,根据该播放速度对输入的语音信息的语速进行调节,以适应各种用户的需求;即实现了根据语音信息的内容自适应的调节播放速度,且该方法可以用于用户通话以及程序播放等场合,提高了该方法的适应性强。使不同用户可以根据自身需求自适应语音播放速度,提升用户感受。Based on the above technical solution, the method for automatically adjusting the speech rate according to the embodiment of the present invention can determine a predetermined playing speed corresponding to the voice feature information according to the voice feature information of the voice information input in real time, and input the sound according to the playing speed. The speech rate of the voice information is adjusted to adapt to the needs of various users; that is, the playback speed is adaptively adjusted according to the content of the voice information, and the method can be used for occasions such as user call and program play, and the method is improved. Adaptable. Enable different users to adapt the voice playback speed according to their own needs and enhance the user experience.
基于上述实施例,该实施例可以根据输入语音信息的语言种类自适应的调节与各个语言种类相对应语音信息播放速度;即能够根据语言种类自适应调节播放速度。优选的,所述提取所述语音信息的语音特征信息具体为:Based on the above embodiment, the embodiment can adaptively adjust the playback speed of the voice information corresponding to each language type according to the language type of the input voice information; that is, the playback speed can be adaptively adjusted according to the language type. Preferably, the voice feature information for extracting the voice information is specifically:
识别所述语音信息的语种特征信息。Identifying language feature information of the voice information.
其中,通过对获取的输入语音信息的识别,可以得到语音信息的语种特征信息该语种特征信息可以包括音频参数,特征词信息,根据该语种特征信息对应的预设的播放速度,确定该语音信息播放的速度。这里可以用户可以对任意语种都分别设置对应的播放速度;或者对预定数量的语种分别设置对应的播放速度;或者将语种分为几大类别,仅针对每种类别设置对应的播放速度,相对应这里的语种特征信息可以是类别信息,或者是将得到语种在判断该语种属于哪一个类别,最后再确定对应的播放速度;这种语种与播放速度的对应关系可以通过对应列表或者映射表实现。 The language feature information of the voice information may be obtained by identifying the acquired input voice information. The language feature information may include an audio parameter, feature word information, and the voice information is determined according to a preset playback speed corresponding to the language feature information. The speed of playback. Here, the user can set the corresponding playback speed for any language separately; or set the corresponding playback speed for a predetermined number of languages; or divide the language into several categories, and set the corresponding playback speed for each category only, corresponding to The language feature information herein may be category information, or the language to be judged in which category the language belongs to, and finally the corresponding playback speed is determined; the correspondence between the language and the playback speed may be implemented by a corresponding list or a mapping table.
其中,语种特征信息的识别方法可以通过用户语种识别系统和语言文本翻译系统合成用户每种语言的“参考语音”、基于音段和音节的马尔可夫模型、基音轮廓、共振峰矢量、声学特征、方言性的音素和韵律特征、及其原始的语音声波特征进行识别。使用的分类方法可以包括HMM、专家系统、聚类算法、二次分类、以及人工神经网络。The method for identifying language feature information can synthesize "reference speech" for each language of the user, a Markov model based on segments and syllables, a pitch contour, a formant vector, and an acoustic feature through a user language recognition system and a language text translation system. The dialectical phoneme and prosodic features, and their original phonetic acoustic characteristics are identified. The classification methods used may include HMMs, expert systems, clustering algorithms, secondary classification, and artificial neural networks.
下面通过几种具体的应用场景对上述实施例进行说明:The following embodiments are described in the following specific application scenarios:
将监听到终端中应用程序存在输入语音信息时,对获取的语音信息进行识别,若判定该语种特征信息为英语时,确定用户预设的英语对应的播放速度,并将语音信息的语速调节为对应的播放速度。其中英语仅为举例。When the input voice information exists in the application in the terminal, the acquired voice information is identified, and if the language feature information is determined to be English, the playback speed corresponding to the English preset by the user is determined, and the speech rate of the voice information is adjusted. For the corresponding playback speed. English is only an example.
在用户进行通话时,可以仅检测本端用户的语音信息的语种,也可以仅检测对端用户的语音信息的语种,也可以检测本端用户及对端用户的语音信息的语种;下面以最后一种情况为例进行说明:When the user is in a call, the language of the local user's voice information may be detected, or only the language of the voice information of the peer user may be detected, or the language of the voice information of the local user and the peer user may be detected; A case is illustrated as an example:
开始时手机处于正常通信状态,主被叫已经接通。语音信息获取模块获取输入的语音信息;语音特征提取模块对双方的音频参数以及关键词句进行提取。播放速度确定模块将提取到的音频参数解析,查询语音数据库并进行语种判断,根据语种确定用户预设的播放速度。播放速度调节模块对语音信息进行时间上的拉长或缩短处理。听筒播放经过处理的语音信息。双方挂断电话,通话完成。At the beginning, the mobile phone is in normal communication state and the main called party is connected. The voice information acquiring module acquires the input voice information; the voice feature extraction module extracts the audio parameters and the keyword sentences of both parties. The play speed determination module parses the extracted audio parameters, queries the voice database and performs language judgment, and determines the preset play speed of the user according to the language. The playback speed adjustment module temporally lengthens or shortens the voice information. The handset plays the processed voice message. Both parties hang up and the call is completed.
该实施例用户可以根据自身实际情况确定对每种语言的接收能力,合理设定播放速度,可以解决用户在与非母语用户沟通时听力困难的问题。In this embodiment, the user can determine the receiving ability for each language according to his actual situation, and set the playing speed reasonably, which can solve the problem that the user has difficulty in communicating with the non-native language user.
基于上述任意实施例,该实施例主要用于用户之间进行语音交流时,可能会出现语速过快,情绪激动等情况,为了能够在这些情况下用户之间的交流可以顺利进行,根据用户语音信息的语音特征信息确定用户的状态,确定该状态下设定的播放速度;即能够根据用户说话状态自适应调节播放速度。优选的,所述提取所述语音信息的语音特征信息具体为:Based on any of the above embodiments, the embodiment is mainly used for voice communication between users, and may have a fast speech rate, an emotional excitement, etc., in order to be able to smoothly communicate between users in these situations, according to the user. The voice feature information of the voice information determines the state of the user, and determines the play speed set in the state; that is, the play speed can be adaptively adjusted according to the user's speaking state. Preferably, the voice feature information for extracting the voice information is specifically:
提取所述语音信息的语速信息,特征词信息及音频信息中至少一种。Extracting at least one of speech rate information, feature word information, and audio information of the voice information.
其中,这些需要首先确定每种语音特征信息对应的或者反应的用户状态,进行确定在该种状态下应该设置什么样的播放速度。这里可以仅仅根据语速 信息进行判定,也可以仅仅根据特征词信息进行判定等,即语速信息,特征词信息及音频信息可以任意组合;Among them, these need to first determine the user state corresponding to or reacted to each voice feature information, and determine what kind of playback speed should be set in this state. Here can be based only on the speed of speech The information is determined, and the determination may be performed only based on the feature word information, that is, the speech rate information, the feature word information, and the audio information may be arbitrarily combined;
单个使用时,根据每种语音特征信息情况进行分类,并对分类后的每种情况设定对应的播放速度,例如语速信息,用户在急躁的情况下说话语速一般会过快,则当语速信息超过一定值时即可以认为该用户为急躁,将其语音信息设置为预定的急躁下的播放速度,当然也可以将语速分为若干个语速范围,并设置每个语速范围下对应的播放速度。When used alone, it is classified according to each voice feature information, and the corresponding playback speed is set for each case after classification, for example, the speech rate information, and the user usually speaks too fast when the user is in a hurry. When the speech rate information exceeds a certain value, the user can be considered as anxious, and the voice information is set to a predetermined speed of the next speed. Of course, the speech rate can be divided into several speech speed ranges, and each speech speed range can be set. The corresponding playback speed.
为了提高语速调节的准确性,优选的可以将语速信息,特征词信息及音频信息结合使用,即根据三个特征的信息综合来确定播放速度。例如,用户在急躁的情况下说话语速一般会过快,会出现一些特定词语(用户可以根据自身的特点设定在自己急躁情况下的习惯性用词),并且声音会高,若出现三者或者至少两者即可以认为该用户为急躁,将其语音信息设置为预定的急躁下的播放速度。In order to improve the accuracy of the speech rate adjustment, it is preferable to use the speech rate information, the feature word information and the audio information in combination, that is, the playback speed is determined according to the information synthesis of the three features. For example, when the user is in a hurry, the speaking speed will generally be too fast, and some specific words will appear (the user can set the habitual words in their own urgency according to their own characteristics), and the sound will be high, if there are three The user or at least both can consider the user to be impatient and set their voice information to a predetermined speed of play.
该实施例中的语速信息,特征词信息及音频信息可以任意与语种特征信息进行组合使用。如设置英语各个语速范围下对应的播放速度,汉语各个语速范围下对应的播放速度。The speech rate information, the feature word information and the audio information in this embodiment can be used in combination with the language feature information. For example, the corresponding playback speed in the English speech rate range and the corresponding playback speed in the Chinese speech rate range are set.
基于上述实施例,用户能自适应调节通话语速的问题。使不同用户可以根据自身需求改变语音播放速度,提升用户感受。Based on the above embodiment, the user can adaptively adjust the problem of the speech rate. Enable different users to change the voice playback speed according to their own needs and enhance the user experience.
基于上述任意实施例,该实施例主要为了能够更加准确的确定本端用户的状态,进而确定本端用户在该状态下的播放速度;能够根据本端用户说话状态自适应调节播放速度。即所述语音信息为本端用户的语音信息,该方法还可以包括:Based on any of the foregoing embodiments, the embodiment is mainly for determining the state of the local user more accurately, and determining the playing speed of the local user in the state; and adjusting the playing speed according to the local user speaking state. That is, the voice information is the voice information of the local user, and the method may further include:
获取所述本端用户的体征信息;Obtaining the physical location information of the local user;
相应的从语音数据库中查询与所述语音特征信息相对应的所述语音信息的播放速度,包括:Corresponding to query the playing speed of the voice information corresponding to the voice feature information from the voice database, including:
从语音数据库中查询与所述语音特征信息及所述体征信息相对应的所述语音信息的播放速度。A playback speed of the voice information corresponding to the voice feature information and the body information is queried from a voice database.
其中,上述实施例可以根据语速信息,特征词信息及音频信息确定用户的状态,为了更加准确的确定本端用户是否处于该状态下,还可以获取本端 用户的体征信息,体征信息可以包括本端用户的体温,脉搏等。且体征信息的采集可以通过与终端相适应的智能穿戴设备如智能手环等采集。The foregoing embodiment may determine the state of the user according to the speech rate information, the feature word information, and the audio information. In order to more accurately determine whether the local user is in the state, the local end may also be acquired. The physical information of the user, the physical information may include the body temperature, pulse, and the like of the local user. The collection of the vital information can be collected by a smart wearable device such as a smart bracelet that is compatible with the terminal.
例如本端用户即主叫端用户急于表述某事或情绪激动时,其语音信息内容所用的词句符合数据库中对用户急躁这类定义,并且从智能手环采集到了用户脉搏加快等信息,那么可以确定用户处于急躁状态,会按照急躁对应的播放速度降低获取的输入语音信息的语速。达到舒缓的目的,使得用户可以更加高效与友好的使用手机通话功能。具体过程可以如下:For example, when the local user, that is, the calling user is eager to express something or is emotional, the words and phrases used in the voice information content are consistent with the definition of the user in the database, and the information such as the user's pulse is collected from the smart bracelet, then Determining that the user is in an impatient state will reduce the speech rate of the acquired input voice information according to the playback speed corresponding to the emergency. Achieve soothing purposes, allowing users to use mobile phone calls more efficiently and friendly. The specific process can be as follows:
手机处于正常通信状态,主被叫已经接通。采集用户的语音信息,并通过智能手环采集用户通话过程中的体温、脉搏等信息。查询语音数据库信息,结合用户通话过程中的体温、脉搏变化与关键词句即特征词信息的使用,判断用户是否有情绪激动的状况。并根据语速信息判断是否需要调节。如果满足调节的条件,则根据语音数据库中的预设值来进行调节,确定新的播放速度。对语音信息数据进行时间上的拉长或缩短处理。听筒播放经过处理的语音数据。且可以将本次用户的情绪变化信息和特征语句写入语音数据库,以优化后续对情绪判断的计算。The mobile phone is in normal communication state and the main called party is connected. Collect user's voice information, and collect information such as body temperature and pulse during the user's call through the smart bracelet. The voice database information is queried, and the user's body temperature, pulse change and keyword sentence, that is, the use of feature word information, are used to determine whether the user is emotionally excited. And based on the speech rate information to determine whether adjustment is needed. If the condition of the adjustment is satisfied, the adjustment is made according to the preset value in the voice database to determine the new playback speed. Temporary stretching or shortening of voice information data. The handset plays the processed voice data. The user's emotional change information and feature sentences can be written into the voice database to optimize the subsequent calculation of the emotional judgment.
基于上述任意实施例,该实施例主要提高语音数据库的准确性,因此,该方法还包括:Based on any of the foregoing embodiments, the embodiment mainly improves the accuracy of the voice database. Therefore, the method further includes:
利用所述语音特征信息及所述体征信息,根据机器学习算法对语音数据库中播放速度的对应关系进行更新。The voice relationship information and the physical condition information are used to update the correspondence relationship between the playback speeds in the voice database according to the machine learning algorithm.
其中,在终端中维护语音数据库,可以对用户的音频信息参数进行存储,这样指导就具备语速调节的学习功能。可以根据不同的用户群体的长期使用习惯进行调节,而不是完全按照指导的原始设定数据来调节,具有更好的适应性。具有学习功能,会不断更新用户常使用的关键性用语即特征词信息,以优化后续对与用户情绪判断的计算。Among them, the voice database is maintained in the terminal, and the audio information parameters of the user can be stored, so that the guidance has the learning function of the speech rate adjustment. It can be adjusted according to the long-term usage habits of different user groups, rather than being adjusted according to the original setting data of the guide, and has better adaptability. With learning function, it will constantly update the key words used by users, namely feature word information, to optimize the calculation of subsequent judgments on user emotions.
基于上述技术方案,本发明实施例提的语速自动调节的方法,能够根据实时输入的语音信息的语音特征信息,确定与该语音特征信息相对应的预定的播放速度,根据该播放速度对输入的语音信息的语速进行调节,以适应各种用户的需求;即实现了根据语音信息的内容自适应的调节播放速度,且该 方法可以用于用户通话以及程序播放等场合,提高了该方法的适应性强。使不同用户可以根据自身需求自适应语音播放速度,提升用户感受。Based on the above technical solution, the method for automatically adjusting the speech rate according to the embodiment of the present invention can determine a predetermined playing speed corresponding to the voice feature information according to the voice feature information of the voice information input in real time, and input the sound according to the playing speed. The speech rate of the voice information is adjusted to adapt to the needs of various users; that is, the playback speed is adaptively adjusted according to the content of the voice information, and the The method can be used for occasions such as user call and program play, and the adaptability of the method is improved. Enable different users to adapt the voice playback speed according to their own needs and enhance the user experience.
本发明实施例提供了语速自动调节的方法,能够根据实时输入的语音信息的语音特征信息,确定与该语音特征信息相对应的预定的播放速度,根据该播放速度对输入的语音信息的语速进行调节。The embodiment of the invention provides a method for automatically adjusting the speech rate, and can determine a predetermined playing speed corresponding to the voice feature information according to the voice feature information of the voice information input in real time, and the language of the input voice information according to the playing speed. Speed adjustment.
下面对本发明实施例提供的终端进行介绍,下文描述的终端与上文描述的语速自动调节的方法可相互对应参照。The terminal provided by the embodiment of the present invention is introduced below, and the terminal described below and the method for automatically adjusting the speech rate described above can refer to each other.
请参考图2,图2为本发明实施例所提供的终端的结构框图;该终端可以包括:Referring to FIG. 2, FIG. 2 is a structural block diagram of a terminal according to an embodiment of the present invention; the terminal may include:
语音信息获取模块100,用于获取输入的语音信息;The voice information acquiring module 100 is configured to acquire the input voice information.
语音特征提取模块200,用于提取所述语音信息的语音特征信息;The voice feature extraction module 200 is configured to extract voice feature information of the voice information;
播放速度确定模块300,用于从语音数据库中查询与所述语音特征信息相对应的所述语音信息的播放速度;The playing speed determining module 300 is configured to query, from the voice database, a playing speed of the voice information corresponding to the voice feature information;
播放速度调节模块400,用于根据所述播放速度调节所述语音信息播放的速度。The play speed adjustment module 400 is configured to adjust the speed of playing the voice information according to the play speed.
可选的,所述语音特征提取模块200包括:Optionally, the voice feature extraction module 200 includes:
第一语音特征提取单元,用于识别所述语音信息的语种特征信息;和/或,a first speech feature extraction unit, configured to identify language feature information of the voice information; and/or,
第二语音特征提取单元,用于提取所述语音信息的语速信息,特征词信息及音频信息中至少一种。The second voice feature extraction unit is configured to extract at least one of the speech rate information, the feature word information, and the audio information of the voice information.
可选的,请参考图3,所述语音信息为本端用户的语音信息,该终端还包括:Optionally, referring to FIG. 3, the voice information is voice information of the local user, and the terminal further includes:
体征信息获取模块500,用于获取所述本端用户的体征信息。The physical information acquisition module 500 is configured to acquire the physical location information of the local user.
其中,这时播放速度确定模块300具体为从语音数据库中查询与所述语音特征信息及所述体征信息相对应的所述语音信息的播放速度的模块。The playing speed determining module 300 is specifically configured to query, from the voice database, a playing speed of the voice information corresponding to the voice feature information and the physical information.
可选的,请参考图4,该终端还包括:Optionally, referring to FIG. 4, the terminal further includes:
机器学习模块600,用于利用所述语音特征信息及所述体征信息,根据机器学习算法对语音数据库中播放速度的对应关系进行更新。The machine learning module 600 is configured to update the correspondence between the playback speeds in the voice database according to the machine learning algorithm by using the voice feature information and the physical location information.
可选的,播放速度调节模块400具体为通过插值或者抽剪对所述语音信息的数字信号重新采样,调节所述语音信息的时间尺度达到所述播放速度的模 块。Optionally, the play speed adjustment module 400 specifically resamples the digital signal of the voice information by interpolation or clipping, and adjusts a time scale of the voice information to a mode of the play speed. Piece.
其中,基于上述任意实施例,该终端具体可以为手机。The terminal may be specifically a mobile phone based on any of the foregoing embodiments.
说明书中各个实施例采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似部分互相参见即可。对于实施例公开的装置而言,由于其与实施例公开的方法相对应,所以描述的比较简单,相关之处参见方法部分说明即可。The various embodiments in the specification are described in a progressive manner, and each embodiment focuses on differences from other embodiments, and the same similar parts between the various embodiments may be referred to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant parts can be referred to the method part.
专业人员还可以进一步意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本发明的范围。A person skilled in the art will further appreciate that the elements and algorithm steps of the various examples described in connection with the embodiments disclosed herein can be implemented in electronic hardware, computer software or a combination of both, in order to clearly illustrate the hardware and software. Interchangeability, the composition and steps of the various examples have been generally described in terms of function in the above description. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the solution. A person skilled in the art can use different methods for implementing the described functions for each particular application, but such implementation should not be considered to be beyond the scope of the present invention.
结合本文中所公开的实施例描述的方法或算法的步骤可以直接用硬件、处理器执行的软件模块,或者二者的结合来实施。软件模块可以置于随机存储器(RAM)、内存、只读存储器(ROM)、电可编程ROM、电可擦除可编程ROM、寄存器、硬盘、可移动磁盘、CD-ROM、或技术领域内所公知的任意其它形式的存储介质中。The steps of a method or algorithm described in connection with the embodiments disclosed herein can be implemented directly in hardware, a software module executed by a processor, or a combination of both. The software module can be placed in random access memory (RAM), memory, read only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, removable disk, CD-ROM, or technical field. Any other form of storage medium known.
以上对本发明所提供的语速自动调节的方法及终端进行了详细介绍。本文中应用了具体个例对本发明的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本发明的方法及其核心思想。应当指出,对于本技术领域的普通技术人员来说,在不脱离本发明原理的前提下,还可以对本发明进行若干改进和修饰,这些改进和修饰也落入本发明权利要求的保护范围内。 The method and terminal for automatically adjusting the speech rate provided by the present invention are described in detail above. The principles and embodiments of the present invention have been described herein with reference to specific examples, and the description of the above embodiments is only to assist in understanding the method of the present invention and its core idea. It should be noted that those skilled in the art can make various modifications and changes to the present invention without departing from the spirit and scope of the invention.

Claims (10)

  1. 一种语速自动调节的方法,其特征在于,包括:A method for automatically adjusting speech rate, characterized in that it comprises:
    获取输入的语音信息;Obtain the input voice information;
    提取所述语音信息的语音特征信息;Extracting voice feature information of the voice information;
    从语音数据库中查询与所述语音特征信息相对应的所述语音信息的播放速度;Querying, from the voice database, a play speed of the voice information corresponding to the voice feature information;
    根据所述播放速度调节所述语音信息播放的速度。Adjusting the speed at which the voice information is played according to the playback speed.
  2. 如权利要求1所述的语速自动调节的方法,其特征在于,所述提取所述语音信息的语音特征信息,包括:The method for automatically adjusting the speech rate according to claim 1, wherein the extracting the voice feature information of the voice information comprises:
    识别所述语音信息的语种特征信息;和/或,Identifying language feature information of the voice message; and/or,
    提取所述语音信息的语速信息,特征词信息及音频信息中至少一种。Extracting at least one of speech rate information, feature word information, and audio information of the voice information.
  3. 如权利要求1或2所述的语速自动调节的方法,其特征在于,所述语音信息为本端用户的语音信息,该方法还包括:The method for automatically adjusting the speech rate according to claim 1 or 2, wherein the voice information is voice information of the local user, the method further comprising:
    获取所述本端用户的体征信息;Obtaining the physical location information of the local user;
    从语音数据库中查询与所述语音特征信息相对应的所述语音信息的播放速度,包括:Querying, by the voice database, the playing speed of the voice information corresponding to the voice feature information, including:
    从语音数据库中查询与所述语音特征信息及所述体征信息相对应的所述语音信息的播放速度。A playback speed of the voice information corresponding to the voice feature information and the body information is queried from a voice database.
  4. 如权利要求3所述的语速自动调节的方法,其特征在于,将从语音数据库中查询与所述语音特征信息及所述体征信息相对应的所述语音信息的播放速度,还包括:The method for automatically adjusting the speech rate according to claim 3, wherein the querying the playback speed of the voice information corresponding to the voice feature information and the body information from the voice database further includes:
    利用所述语音特征信息及所述体征信息,根据机器学习算法对语音数据库中播放速度的对应关系进行更新。The voice relationship information and the physical condition information are used to update the correspondence relationship between the playback speeds in the voice database according to the machine learning algorithm.
  5. 如权利要求1所述的语速自动调节的方法,其特征在于,根据所述播放速度调节所述语音信息播放的速度,包括:The method for automatically adjusting the speech rate according to claim 1, wherein adjusting the speed of playing the voice information according to the playing speed comprises:
    通过插值或者抽剪对所述语音信息的数字信号重新采样,调节所述语音信息的时间尺度达到所述播放速度。The digital signal of the voice information is resampled by interpolation or cropping, and the time scale of the voice information is adjusted to reach the playing speed.
  6. 一种终端,其特征在于,包括:A terminal, comprising:
    语音信息获取模块,用于获取输入的语音信息;a voice information acquiring module, configured to obtain input voice information;
    语音特征提取模块,用于提取所述语音信息的语音特征信息; a voice feature extraction module, configured to extract voice feature information of the voice information;
    播放速度确定模块,用于从语音数据库中查询与所述语音特征信息相对应的所述语音信息的播放速度;a playing speed determining module, configured to query, from the voice database, a playing speed of the voice information corresponding to the voice feature information;
    播放速度调节模块,用于根据所述播放速度调节所述语音信息播放的速度。And a play speed adjustment module, configured to adjust a speed of playing the voice information according to the play speed.
  7. 如权利要求6所述的终端,其特征在于,所述语音特征提取模块包括:The terminal according to claim 6, wherein the voice feature extraction module comprises:
    第一语音特征提取单元,用于识别所述语音信息的语种特征信息;和/或,a first speech feature extraction unit, configured to identify language feature information of the voice information; and/or,
    第二语音特征提取单元,用于提取所述语音信息的语速信息,特征词信息及音频信息中至少一种。The second voice feature extraction unit is configured to extract at least one of the speech rate information, the feature word information, and the audio information of the voice information.
  8. 如权利要求6或7所述的终端,其特征在于,所述语音信息为本端用户的语音信息,该终端还包括:The terminal according to claim 6 or 7, wherein the voice information is voice information of the local user, and the terminal further includes:
    体征信息获取模块,用于获取所述本端用户的体征信息。The physical information acquisition module is configured to obtain the physical location information of the local user.
  9. 如权利要求8所述的终端,其特征在于,还包括:The terminal according to claim 8, further comprising:
    机器学习模块,用于利用所述语音特征信息及所述体征信息,根据机器学习算法对语音数据库中播放速度的对应关系进行更新。The machine learning module is configured to update the correspondence between the playback speeds in the voice database according to the machine learning algorithm by using the voice feature information and the physical information.
  10. 如权利要求6所述的终端,其特征在于,所述播放速度调节模块具体为通过插值或者抽剪对所述语音信息的数字信号重新采样,调节所述语音信息的时间尺度达到所述播放速度的模块。 The terminal according to claim 6, wherein the playback speed adjustment module specifically resamples the digital signal of the voice information by interpolation or cropping, and adjusts a time scale of the voice information to reach the playback speed. Module.
PCT/CN2016/087741 2016-05-31 2016-06-29 Method for automatically adjusting speaking speed and terminal WO2017206256A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610375868.9 2016-05-31
CN201610375868.9A CN105869626B (en) 2016-05-31 2016-05-31 A kind of method and terminal of word speed automatic adjustment

Publications (1)

Publication Number Publication Date
WO2017206256A1 true WO2017206256A1 (en) 2017-12-07

Family

ID=56643245

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/087741 WO2017206256A1 (en) 2016-05-31 2016-06-29 Method for automatically adjusting speaking speed and terminal

Country Status (2)

Country Link
CN (1) CN105869626B (en)
WO (1) WO2017206256A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112750436A (en) * 2020-12-29 2021-05-04 上海掌门科技有限公司 Method and equipment for determining target playing speed of voice message
CN112750456A (en) * 2020-09-11 2021-05-04 腾讯科技(深圳)有限公司 Voice data processing method and device in instant messaging application and electronic equipment
CN113470617A (en) * 2021-06-28 2021-10-01 科大讯飞股份有限公司 Speech recognition method, electronic device and storage device
CN114979798A (en) * 2022-04-21 2022-08-30 维沃移动通信有限公司 Play speed control method and electronic equipment

Families Citing this family (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106448653A (en) * 2016-09-27 2017-02-22 惠州市德赛工业研究院有限公司 Wearable intelligent terminal
CN106486111B (en) * 2016-10-14 2020-02-07 北京光年无限科技有限公司 Multi-TTS engine output speech speed adjusting method and system based on intelligent robot
CN106534964B (en) * 2016-11-23 2020-02-14 广东小天才科技有限公司 Method and device for adjusting speech rate
US20180350371A1 (en) * 2017-05-31 2018-12-06 Lenovo (Singapore) Pte. Ltd. Adjust output settings based on an identified user
CN107689229A (en) * 2017-09-25 2018-02-13 广东小天才科技有限公司 Voice processing method and device for wearable equipment
CN108630224B (en) * 2018-03-22 2020-06-09 云知声智能科技股份有限公司 Method and device for controlling speech rate
CN109119088A (en) * 2018-08-29 2019-01-01 歌尔科技有限公司 A kind of adjusting method of audio signal, device, equipment and computer storage medium
CN109147802B (en) * 2018-10-22 2020-10-20 珠海格力电器股份有限公司 Playing speed adjusting method and device
CN109582275A (en) * 2018-12-03 2019-04-05 珠海格力电器股份有限公司 Voice adjusting method and device, storage medium and electronic device
CN109348068A (en) * 2018-12-03 2019-02-15 咪咕数字传媒有限公司 Information processing method, device and storage medium
CN111292737A (en) * 2018-12-07 2020-06-16 阿里巴巴集团控股有限公司 Voice interaction and voice awakening detection method, device, equipment and storage medium
CN109521718A (en) * 2019-01-11 2019-03-26 深圳汉尼康科技有限公司 Electronic speech device and control method
CN109979474B (en) * 2019-03-01 2021-04-13 珠海格力电器股份有限公司 Voice equipment and user speech rate correction method and device thereof and storage medium
CN110798327B (en) * 2019-09-04 2022-09-30 腾讯科技(深圳)有限公司 Message processing method, device and storage medium
CN111031386B (en) * 2019-12-17 2021-07-30 腾讯科技(深圳)有限公司 Video dubbing method and device based on voice synthesis, computer equipment and medium
CN112185403B (en) * 2020-09-07 2024-06-04 广州多益网络股份有限公司 Voice signal processing method and device, storage medium and terminal equipment
CN112185363B (en) * 2020-10-21 2024-02-13 北京猿力未来科技有限公司 Audio processing method and device
CN112423019B (en) * 2020-11-17 2022-11-22 北京达佳互联信息技术有限公司 Method and device for adjusting audio playing speed, electronic equipment and storage medium
CN112565880B (en) * 2020-12-28 2023-03-24 北京五街科技有限公司 Method and system for playing explanation videos
CN112565881B (en) * 2020-12-28 2023-03-24 北京五街科技有限公司 Self-adaptive video playing method and system
CN112820289A (en) * 2020-12-31 2021-05-18 广东美的厨房电器制造有限公司 Voice playing method, voice playing system, electric appliance and readable storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070177633A1 (en) * 2006-01-30 2007-08-02 Inventec Multimedia & Telecom Corporation Voice speed adjusting system of voice over Internet protocol (VoIP) phone and method therefor
CN101427314A (en) * 2006-04-25 2009-05-06 英特尔公司 Method and apparatus for automatic adjustment of play speed of audio data
CN101860617A (en) * 2009-04-12 2010-10-13 比亚迪股份有限公司 Mobile terminal with voice processing effect and method thereof
JP2011087196A (en) * 2009-10-16 2011-04-28 Nec Saitama Ltd Telephone set, and speech speed conversion method of telephone set
JP2015184349A (en) * 2014-03-20 2015-10-22 日本放送協会 Voice signal processing device and program
CN105405439A (en) * 2015-11-04 2016-03-16 科大讯飞股份有限公司 Voice playing method and device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070177633A1 (en) * 2006-01-30 2007-08-02 Inventec Multimedia & Telecom Corporation Voice speed adjusting system of voice over Internet protocol (VoIP) phone and method therefor
CN101427314A (en) * 2006-04-25 2009-05-06 英特尔公司 Method and apparatus for automatic adjustment of play speed of audio data
CN101860617A (en) * 2009-04-12 2010-10-13 比亚迪股份有限公司 Mobile terminal with voice processing effect and method thereof
JP2011087196A (en) * 2009-10-16 2011-04-28 Nec Saitama Ltd Telephone set, and speech speed conversion method of telephone set
JP2015184349A (en) * 2014-03-20 2015-10-22 日本放送協会 Voice signal processing device and program
CN105405439A (en) * 2015-11-04 2016-03-16 科大讯飞股份有限公司 Voice playing method and device

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112750456A (en) * 2020-09-11 2021-05-04 腾讯科技(深圳)有限公司 Voice data processing method and device in instant messaging application and electronic equipment
CN112750436A (en) * 2020-12-29 2021-05-04 上海掌门科技有限公司 Method and equipment for determining target playing speed of voice message
CN113470617A (en) * 2021-06-28 2021-10-01 科大讯飞股份有限公司 Speech recognition method, electronic device and storage device
CN113470617B (en) * 2021-06-28 2024-05-31 科大讯飞股份有限公司 Speech recognition method, electronic equipment and storage device
CN114979798A (en) * 2022-04-21 2022-08-30 维沃移动通信有限公司 Play speed control method and electronic equipment
CN114979798B (en) * 2022-04-21 2024-03-22 维沃移动通信有限公司 Playing speed control method and electronic equipment

Also Published As

Publication number Publication date
CN105869626A (en) 2016-08-17
CN105869626B (en) 2019-02-05

Similar Documents

Publication Publication Date Title
WO2017206256A1 (en) Method for automatically adjusting speaking speed and terminal
US12027152B2 (en) Recognizing accented speech
JP6113302B2 (en) Audio data transmission method and apparatus
US9769296B2 (en) Techniques for voice controlling bluetooth headset
US8655659B2 (en) Personalized text-to-speech synthesis and personalized speech feature extraction
US20160026627A1 (en) System And Method For Enhancing Voice-Enabled Search Based On Automated Demographic Identification
JP2023022150A (en) Bidirectional speech translation system, bidirectional speech translation method and program
US11587547B2 (en) Electronic apparatus and method for controlling thereof
US11074916B2 (en) Information processing system, and information processing method
US20060229873A1 (en) Methods and apparatus for adapting output speech in accordance with context of communication
US20120221321A1 (en) Speech translation system, control device, and control method
CN107871503A (en) Speech dialogue system and sounding are intended to understanding method
US20150046164A1 (en) Method, apparatus, and recording medium for text-to-speech conversion
WO2019242414A1 (en) Voice processing method and apparatus, storage medium, and electronic device
KR20150017662A (en) Method, apparatus and storing medium for text to speech conversion
CN109377979B (en) Method and system for updating welcome language
US11699043B2 (en) Determination of transcription accuracy
CN106981289A (en) A kind of identification model training method and system and intelligent terminal
CN111192586B (en) Speech recognition method and device, electronic equipment and storage medium
CN113643684B (en) Speech synthesis method, device, electronic equipment and storage medium
JP6599828B2 (en) Sound processing method, sound processing apparatus, and program
CN110767233A (en) Voice conversion system and method
KR102114365B1 (en) Speech recognition method and apparatus

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16903638

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16903638

Country of ref document: EP

Kind code of ref document: A1