CN105869626A - Automatic speech rate adjusting method and terminal - Google Patents

Automatic speech rate adjusting method and terminal Download PDF

Info

Publication number
CN105869626A
CN105869626A CN201610375868.9A CN201610375868A CN105869626A CN 105869626 A CN105869626 A CN 105869626A CN 201610375868 A CN201610375868 A CN 201610375868A CN 105869626 A CN105869626 A CN 105869626A
Authority
CN
China
Prior art keywords
information
speed
voice messaging
broadcasting
described voice
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610375868.9A
Other languages
Chinese (zh)
Other versions
CN105869626B (en
Inventor
王晓军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yulong Computer Telecommunication Scientific Shenzhen Co Ltd
Original Assignee
Yulong Computer Telecommunication Scientific Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yulong Computer Telecommunication Scientific Shenzhen Co Ltd filed Critical Yulong Computer Telecommunication Scientific Shenzhen Co Ltd
Priority to CN201610375868.9A priority Critical patent/CN105869626B/en
Priority to PCT/CN2016/087741 priority patent/WO2017206256A1/en
Publication of CN105869626A publication Critical patent/CN105869626A/en
Application granted granted Critical
Publication of CN105869626B publication Critical patent/CN105869626B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/72Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
    • H04M1/724User interfaces specially adapted for cordless or mobile telephones
    • H04M1/72403User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality
    • H04M1/7243User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality with interactive means for internal management of messages
    • H04M1/72433User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality with interactive means for internal management of messages for voice messaging, e.g. dictaphones

Abstract

The invention discloses an automatic speech rate adjusting method. The method comprises the steps that input speech information is acquired; speech feature information of the speech information is extracted; the speech information playing rate corresponding to the speech feature information is inquired from a speech database; the speech information playing rate is adjusted according to the playing rate. According to the method, the preset playing rate corresponding to the speech feature information can be determined according to the speech feature information of the speech information input in real time, and the speech rate of the input speech information is adjusted according to the playing rate to meet the requirements various users; not only is the purpose of self-adaptively adjusting the playing rate according to the content of the speech information achieved, but also the adaptability is high when the method is applied to the occasions of communication, program playing and the like. The invention further discloses a terminal which can achieve the purpose of self-adaptively adjusting the playing rate according to the content of the speech information.

Description

A kind of method that word speed is automatically adjusted and terminal
Technical field
The present invention relates to communication technical field, a kind of method being automatically adjusted particularly to word speed and terminal.
Background technology
Due to the difference of the hearing level of people, the broadcasting content of same word speed can be felt for some people Obtain quickly to such an extent as to can not hear clearly, can think that for another part people word speed is very slow to such an extent as to feels Losing time.Therefore, the word speed of the broadcasting content in terminal needs to carry out according to the actual demand of people Set.
In prior art, increase word speed regulation control at user mobile phone client application program so that user Selecting regulation word speed, selected word speed grade, mobile phone sets regulation word speed grade according to user and plays in voice Hold.But said method there is also shortcoming: first, although the regulation of word speed is divided into several grade, However it is necessary that people manually presets, it is impossible to word speed i.e. can not adaptive be adjusted by dynamic regulation. Secondly, word speed regulation is only limitted to the content that mobile phone client software is play, it is impossible to the tune real-time when call Joint word speed.Finally, it is impossible to other kind of speech like sound of self adaptation, word speed tune is carried out according to the languages of both call sides Joint.Therefore, how word speed is adjusted by self adaptation, be those skilled in the art need solve technology Problem.
Summary of the invention
It is an object of the invention to provide method and terminal that a kind of word speed is automatically adjusted, it is possible to according to the most defeated The voice characteristics information of the voice messaging entered, determines the predetermined broadcasting corresponding with this voice characteristics information Speed, is adjusted the word speed of the voice messaging of input according to this broadcasting speed, it is achieved that according to voice The regulation broadcasting speed of the content-adaptive of information.
For solving above-mentioned technical problem, the present invention provides a kind of method that word speed is automatically adjusted, including:
Obtain the voice messaging of input;
Extract the voice characteristics information of described voice messaging;
The broadcasting of the described voice messaging corresponding with described voice characteristics information is inquired about from speech database Speed;
The speed that described voice messaging is play is regulated according to described broadcasting speed.
Wherein, the voice characteristics information of the described voice messaging of described extraction, including:
Identify the languages characteristic information of described voice messaging;And/or,
Extract the word speed information of described voice messaging, in Feature Words information and audio-frequency information at least one.
Wherein, described voice messaging is the voice messaging of this end subscriber, and the method also includes:
Obtain the sign information of described end subscriber;
The broadcasting of the described voice messaging corresponding with described voice characteristics information is inquired about from speech database Speed, including:
From speech database inquire about corresponding with described voice characteristics information and described sign information described in The broadcasting speed of voice messaging.
Wherein, will inquire about from speech database relative with described voice characteristics information and described sign information After the broadcasting speed of the described voice messaging answered, also include:
Utilize described voice characteristics information and described sign information, according to machine learning algorithm to speech data In storehouse, the corresponding relation of broadcasting speed is updated.
Wherein, regulate, according to described broadcasting speed, the speed that described voice messaging is play, including:
By interpolation or take out and cut the digital signal resampling to described voice messaging, regulate described voice The time scale of information reaches described broadcasting speed.
The present invention also provides for a kind of terminal, including:
Voice messaging acquisition module, for obtaining the voice messaging of input;
Pronunciation extracting module, for extracting the voice characteristics information of described voice messaging;
Broadcasting speed determines module, relative with described voice characteristics information for inquiring about from speech database The broadcasting speed of the described voice messaging answered;
Broadcasting speed adjustment module, for regulating, according to described broadcasting speed, the speed that described voice messaging is play Degree.
Wherein, described pronunciation extracting module includes:
First speech feature extraction unit, for identifying the languages characteristic information of described voice messaging;And/or,
Second speech feature extraction unit, for extracting the word speed information of described voice messaging, Feature Words is believed Breath and audio-frequency information at least one.
Wherein, described voice messaging is the voice messaging of this end subscriber, and this terminal also includes:
Sign information acquisition module, for obtaining the sign information of described end subscriber.
Wherein, described terminal also includes:
Machine learning module, is used for utilizing described voice characteristics information and described sign information, according to machine The corresponding relation of broadcasting speed in speech database is updated by learning algorithm.
Wherein, described broadcasting speed adjustment module is cut described voice messaging specifically by interpolation or take out Digital signal resampling, the time scale regulating described voice messaging reaches the mould of described broadcasting speed Block.
The method that word speed provided by the present invention is automatically adjusted, including: obtain the voice messaging of input;Carry Take the voice characteristics information of described voice messaging;Inquiry and described voice characteristics information from speech database The broadcasting speed of corresponding described voice messaging;Regulate described voice messaging according to described broadcasting speed to broadcast The speed put;
Visible the method can determine and this language according to the voice characteristics information of the voice messaging of input in real time The predetermined broadcasting speed that sound characteristic information is corresponding, according to this broadcasting speed voice messaging to inputting Word speed is adjusted, to adapt to the demand of various user;I.e. achieve the content according to voice messaging adaptive The regulation broadcasting speed answered, and the method may be used for the occasion such as user's communication and program broadcasting, improves The adaptability of the method.Present invention also offers a kind of terminal, there is above-mentioned beneficial effect, at this not Repeat again.
Accompanying drawing explanation
In order to be illustrated more clearly that the embodiment of the present invention or technical scheme of the prior art, below will be to reality Execute the required accompanying drawing used in example or description of the prior art to be briefly described, it should be apparent that below, Accompanying drawing in description is only embodiments of the invention, for those of ordinary skill in the art, not On the premise of paying creative work, it is also possible to obtain other accompanying drawing according to the accompanying drawing provided.
The flow chart of the method that the word speed that Fig. 1 is provided by the embodiment of the present invention is automatically adjusted;
The structured flowchart of the terminal that Fig. 2 is provided by the embodiment of the present invention;
The structured flowchart of another terminal that Fig. 3 is provided by the embodiment of the present invention;
The structured flowchart of the another terminal that Fig. 4 is provided by the embodiment of the present invention.
Detailed description of the invention
The core of the present invention is to provide method and the terminal that a kind of word speed is automatically adjusted, it is possible to according to the most defeated The voice characteristics information of the voice messaging entered, determines the predetermined broadcasting corresponding with this voice characteristics information Speed, is adjusted the word speed of the voice messaging of input according to this broadcasting speed, it is achieved that according to voice The regulation broadcasting speed of the content-adaptive of information.
For making the purpose of the embodiment of the present invention, technical scheme and advantage clearer, below in conjunction with this Accompanying drawing in bright embodiment, is clearly and completely described the technical scheme in the embodiment of the present invention, Obviously, described embodiment is a part of embodiment of the present invention rather than whole embodiments.Based on Embodiment in the present invention, those of ordinary skill in the art are obtained under not making creative work premise The every other embodiment obtained, broadly falls into the scope of protection of the invention.
Refer to the flow chart of the method that Fig. 1, Fig. 1 are automatically adjusted by the word speed that the embodiment of the present invention is provided; Executive agent in the present embodiment is terminal, and this terminal can be mobile phone;The method may include that
S100, the voice messaging of acquisition input;
Wherein, the acquisition of voice messaging here can be to talk business and to be capable of voice play function The monitoring of the application program of business realizes;It can be i.e. this end subscriber language when making a phone call or answer the call Message ceases, it is also possible to be peer user voice messaging when making a phone call or answer the call, it is also possible to be to have The voice messaging that the application program of voice play function is play.
S110, extract the voice characteristics information of described voice messaging;
Wherein, the kind of the voice characteristics information extracted here and the quantity of kind can be according to users It is actually needed and confirms, as long as can come corresponding according to this voice characteristics information having in voice messaging Broadcasting speed according to the voice messaging that preset standard regulation obtains.I.e. can be by voice messaging Voice characteristics information regulate it according to preset standard and play word speed and realize word speed and be automatically adjusted.Such as, Here voice characteristics information can include the feature letters such as emotion, languages, phonetic feature, word speed, intonation Breath.
S120, the described voice messaging that inquiry is corresponding with described voice characteristics information from speech database Broadcasting speed;
Wherein, after confirming to need to carry out the voice characteristics information extracted, user can pre-set correspondence Every kind of corresponding broadcasting speed of voice characteristics information, or several voice characteristics information jointly determine correspondence One broadcasting speed;Here can in speech database with the form of corresponding lists by above-mentioned corresponding relation Store, it is also possible to utilize the form of mapping table to be stored by above-mentioned corresponding relation.User is all right The corresponding relation preserved in speech database is modified, deletes, is increased by the change according to practical situation Deng amendment, to ensure that the corresponding broadcasting speed of the voice characteristics information that sets is as up-to-date, it is possible to satisfied The actual demand of user.
Here voice inquirement data base, it is also possible to include the voice characteristics information extracted and speech database The range intervals of such voice characteristics information of middle correspondence contrasts, it is judged that the voice characteristics information of extraction Numerical value be positioned at which scope, and then confirm the default broadcasting speed that this scope is corresponding.User can also root According to actual demand, the range intervals of voice characteristics information is modified, it is also possible to corresponding to each scope Default broadcasting speed is modified, and to adapt to the individual demand of user, improves Consumer's Experience.
S130, the speed play according to the described broadcasting speed described voice messaging of regulation.
Wherein, voice messaging is adjusted by the broadcasting speed according to obtaining, to reach this broadcasting speed. The most the method for concrete voice messaging regulation is not defined, if the voice letter that can will obtain Breath is adjusted to the broadcasting speed of correspondence and plays out.Provide below a kind of concrete word speed regulation process: By interpolation or take out and cut the digital signal resampling to described voice messaging, regulate described voice messaging Time scale reach described broadcasting speed.I.e. by interpolation or take out and cut digital signal resampling, Thus elongate or shorten the time scale of voice, reach to change the purpose of word speed.
Such as, during people use mobile phone, call is a basic service, be also one the heaviest The function wanted.But some people's spoken utterance speed ratio is very fast, and somebody's audition is the most bad, in this case ditch Lead to the most relatively difficult.The method is during user uses mobile phone to converse, according to obtain The voice characteristics information such as emotion during double-talk, languages, phonetic feature are adopted by input voice information Collection is also compared with the information in speech database, thus judges, if word speed is too fast, or There is abnormal feedback opposite end, confirms the broadcasting speed that this word speed is corresponding, or the broadcasting speed that abnormal feedback is corresponding Degree, and pass through interpolation or take out and cut digital signal resampling, thus elongate or shorten the time of voice Yardstick, reaches to change the purpose of word speed.User uses according to this end subscriber or peer user when using mobile phone The category of language that uses during mobile phone communication, the factor such as emotion changes, regulation automatically plays back from receiver The speed of sound.To be adapted to the demand of various people.
Wherein, optionally, utilize machine learning algorithm that described speech database carries out study to update.
Safeguard speech database in the terminal, the voice characteristics information parameter of user can be stored, Make machine learning algorithm that as input, voice characteristics information parameter is carried out study to realize speech database Update.Can be accustomed to being adjusted according to the life-time service of different user groups rather than fully according to The original start data instructed regulate, and have more preferable adaptability.
Above-mentioned example implements process and can be such that
This end subscriber i.e. calling terminal user be eager state something or excited time, its speech information content institute Words and phrases meet in data base user's " irritable " this kind of definition, then will be right according to " irritable " The broadcasting speed answered reduces the word speed of the input voice information obtained.Reach the purpose releived so that user Can the more efficient use conversation function of mobile phone with close friend.
The most such as when calling terminal user makes in English, judge that this is English according to voice characteristics information, that The word speed of input voice information will be regulated according to the broadcasting speed that English is corresponding.So after regulation, Called end user i.e. peer user can hear the voice messaging after slowing down, and can to a certain degree solve user and exist The problem of hard of hearing when linking up with non-native user.
Based on technique scheme, the method that the word speed that the embodiment of the present invention carries is automatically adjusted, it is possible to according to The voice characteristics information of the voice messaging of input in real time, determines make a reservation for corresponding with this voice characteristics information Broadcasting speed, according to this broadcasting speed, the word speed of voice messaging of input is adjusted, each to adapt to Plant the demand of user;I.e. achieve the regulation broadcasting speed of the content-adaptive according to voice messaging, and should Method may be used for the occasion such as user's communication and program broadcasting, improves the strong adaptability of the method.Make Different user can promote user's impression according to self-demand adaptive voice broadcasting speed.
Based on above-described embodiment, this embodiment can be adaptive according to the category of language of input voice information Regulate voice messaging broadcasting speed corresponding with each category of language;I.e. can be according to category of language self adaptation Regulation broadcasting speed.Preferably, the described voice messaging of described extraction voice characteristics information particularly as follows:
Identify the languages characteristic information of described voice messaging.
Wherein, by the identification to the input voice information obtained, the languages that can obtain voice messaging are special Reference ceases this languages characteristic information can include audio frequency parameter, Feature Words information, believes according to this languages feature The broadcasting speed preset that breath is corresponding, determines the speed that this voice messaging is play.Here can be permissible with user Any languages are all respectively provided with the broadcasting speed of correspondence;Or it is right to be respectively provided with the languages of predetermined quantity The broadcasting speed answered;Or languages are divided into several big classification, the broadcasting of correspondence is set only for every kind Speed, corresponding languages characteristic information here can be classification information, or will obtain languages and sentence Which classification is these languages disconnected belong to, and determines the broadcasting speed of correspondence the most again;This languages and broadcasting speed The corresponding relation of degree can be realized by corresponding lists or mapping table.
Wherein, the recognition methods of languages characteristic information can pass through user's language recognition system and language text Translation system synthesis " reference voice " of user every kind language, Markovian model based on segment and syllable Type, pitch contour, formant vector, acoustic features, the phoneme of dialect and prosodic features and former The speech sound waves feature begun is identified.The sorting technique used can include HMM, specialist system, gather Class algorithm, secondary classification and artificial neural network.
Below by several concrete application scenarios, above-described embodiment is illustrated:
When will listen to that in terminal, application program exists input voice information, the voice messaging obtained is carried out Identify, during if it is determined that this languages characteristic information is English, determine the broadcasting speed that the English of user preset is corresponding Degree, and the word speed of voice messaging is adjusted to the broadcasting speed of correspondence.Its English is only for example.
When user converses, can only detect the languages of the voice messaging of this end subscriber, it is also possible to only The languages of the voice messaging of detection peer user, it is also possible to detect the voice letter of this end subscriber and peer user The languages of breath;Illustrate as a example by last a kind of situation below:
During beginning, mobile phone is in normal communication state, and calling and called have turned on.Voice messaging acquisition module obtains Take the voice messaging of input;Audio frequency parameter and the crucial words and phrases of both sides are carried out by pronunciation extracting module Extract.Broadcasting speed determines that the audio frequency parameter extracted is resolved by module, and voice inquirement data base is also carried out Languages judge, determine the broadcasting speed of user preset according to languages.Voice is believed by broadcasting speed adjustment module Breath carries out temporal elongation or shortening processes.Treated voice messaging play by receiver.Both sides are hung up Phone, call completes.
This embodiment user can determine the receiving ability to every kind of language according to own actual situation, rationally Set broadcasting speed, user's problem of hard of hearing when linking up with non-native user can be solved.
Based on above-mentioned any embodiment, when this embodiment is mainly used between user carrying out speech exchange, can Word speed can be there will be too fast, the situation such as excited, in order to friendship between user in these cases Stream can be smoothed out, and determines the state of user according to the voice characteristics information of user speech information, determines The broadcasting speed set under this state;I.e. can according to user speak state self-adaption regulation broadcasting speed. Preferably, the described voice messaging of described extraction voice characteristics information particularly as follows:
Extract the word speed information of described voice messaging, in Feature Words information and audio-frequency information at least one.
Wherein, these need first to determine every kind of voice characteristics information User Status that is corresponding or that react, It is determined should arranging which type of broadcasting speed under this kind of state.Here can be only according to word speed Information judges, it is also possible to carry out judgement etc., i.e. word speed information, feature only according to Feature Words information Word information and audio-frequency information can be in any combination;
During used aloned, classify according to every kind of voice characteristics information situation, and to sorted every kind Situation sets corresponding broadcasting speed, such as word speed information, and user speaks word speed one in the case of irritability As can be too fast, then when word speed information exceedes certain value it is believed that this user is for irritable, by its voice Information is set to the broadcasting speed under predetermined irritability, naturally it is also possible to word speed is divided into several word speed models Enclose, and broadcasting speed corresponding under each word speed scope is set.
In order to improve the accuracy of word speed regulation, preferably can be by word speed information, Feature Words information and sound Frequently information is used in combination, and i.e. determines broadcasting speed according to the informix of three features.Such as, user Word speed of speaking in the case of irritability typically can be too fast, it may appear that (user can basis for some particular words The feature of self is set in the habitual word in the case of oneself irritability), and sound can be high, if occurring Three or at least its voice messaging, i.e. it is believed that this user be irritability, is set to predetermined urgency by both Broadcasting speed under hot-tempered.
Word speed information in this embodiment, Feature Words information and audio-frequency information can arbitrarily be believed with languages feature Breath is combined using.As arranged broadcasting speed corresponding under each word speed scope of English, each language of Chinese Broadcasting speed corresponding under speed scope.
Based on above-described embodiment, the problem of user's energy Automatic adjusument call word speed.Make different user permissible Change playout of voice according to self-demand, promote user's impression.
Based on above-mentioned any embodiment, this embodiment is mainly for being determined more accurately this end subscriber State, and then determine this end subscriber broadcasting speed in this condition;Can speak according to this end subscriber State self-adaption regulation broadcasting speed.The most described voice messaging is the voice messaging of this end subscriber, the method Can also include:
Obtain the sign information of described end subscriber;
The corresponding described voice messaging that inquiry is corresponding with described voice characteristics information from speech database Broadcasting speed, including:
From speech database inquire about corresponding with described voice characteristics information and described sign information described in The broadcasting speed of voice messaging.
Wherein, above-described embodiment can be according to word speed information, and Feature Words information and audio-frequency information determine user State, in order to be determined more accurately whether this end subscriber is under this state, it is also possible to obtain local terminal The sign information of user, sign information can include the body temperature of this end subscriber, pulse etc..And sign information Collection can be gathered by the Intelligent worn device such as Intelligent bracelet etc. adapted with terminal.
Such as this end subscriber i.e. calling terminal user be eager state something or excited time, in its voice messaging Words and phrases used by appearance meet this kind of definition irritable to user in data base, and collect from Intelligent bracelet The information such as user's pulse quickening, then may determine that user is in irritable state, can be according to irritable correspondence Broadcasting speed reduces the word speed of the input voice information obtained.Reach the purpose releived so that user is permissible More efficiently with friendly use conversation function of mobile phone.Detailed process can be such that
Mobile phone is in normal communication state, and calling and called have turned on.Gather the voice messaging of user, and lead to Cross the information such as the body temperature during Intelligent bracelet gathers user's communication, pulse.Voice inquirement database information, In conjunction with the body temperature during user's communication, pulse change and the crucial words and phrases i.e. use of Feature Words information, sentence Disconnected user whether be in a bad mood excitement situation.And judge whether to need regulation according to word speed information.If it is full The condition of foot regulation, then be adjusted according to the preset value in speech database, determines new broadcasting speed Degree.Voice messaging data are carried out temporal elongation or shortening processes.Treated language play by receiver Sound data.And the emotion changes information of this user and feature statement can be write speech database, with Optimize the follow-up calculating to emotion judgment.
Based on above-mentioned any embodiment, this embodiment mainly improves the accuracy of speech database, therefore, The method also includes:
Utilize described voice characteristics information and described sign information, according to machine learning algorithm to speech data In storehouse, the corresponding relation of broadcasting speed is updated.
Wherein, safeguard speech database in the terminal, the audio-frequency information parameter of user can be stored, So guidance just possesses the learning functionality of word speed regulation.Can be according to the life-time service of different user groups Custom is adjusted rather than regulates fully according to the original start data instructed, and has the suitableeest Ying Xing.There is learning functionality, the key term i.e. Feature Words information that user often uses can be constantly updated, To optimize the follow-up calculating to judging with user emotion.
Based on technique scheme, the method that the word speed that the embodiment of the present invention carries is automatically adjusted, it is possible to according to The voice characteristics information of the voice messaging of input in real time, determines make a reservation for corresponding with this voice characteristics information Broadcasting speed, according to this broadcasting speed, the word speed of voice messaging of input is adjusted, each to adapt to Plant the demand of user;I.e. achieve the regulation broadcasting speed of the content-adaptive according to voice messaging, and should Method may be used for the occasion such as user's communication and program broadcasting, improves the strong adaptability of the method.Make Different user can promote user's impression according to self-demand adaptive voice broadcasting speed.
Embodiments provide the method that word speed is automatically adjusted, it is possible to according to the voice letter of input in real time The voice characteristics information of breath, determines the predetermined broadcasting speed corresponding with this voice characteristics information, according to The word speed of the voice messaging of input is adjusted by this broadcasting speed.
The terminal provided the embodiment of the present invention below is introduced, and terminal described below is with described above The method that is automatically adjusted of word speed can be mutually to should refer to.
Refer to the structured flowchart of the terminal that Fig. 2, Fig. 2 are provided by the embodiment of the present invention;This terminal is permissible Including:
Voice messaging acquisition module 100, for obtaining the voice messaging of input;
Pronunciation extracting module 200, for extracting the voice characteristics information of described voice messaging;
Broadcasting speed determines module 300, for inquiry from speech database and described voice characteristics information phase The broadcasting speed of corresponding described voice messaging;
Broadcasting speed adjustment module 400, for regulating what described voice messaging was play according to described broadcasting speed Speed.
Optionally, described pronunciation extracting module 200 includes:
First speech feature extraction unit, for identifying the languages characteristic information of described voice messaging;And/or,
Second speech feature extraction unit, for extracting the word speed information of described voice messaging, Feature Words is believed Breath and audio-frequency information at least one.
Optionally, refer to Fig. 3, described voice messaging is the voice messaging of this end subscriber, and this terminal is also wrapped Include:
Sign information acquisition module 500, for obtaining the sign information of described end subscriber.
Wherein, at this moment broadcasting speed determines that module 300 is specially inquiry and institute's predicate from speech database The module of the broadcasting speed of sound characteristic information and the corresponding described voice messaging of described sign information.
Optionally, refer to Fig. 4, this terminal also includes:
Machine learning module 600, is used for utilizing described voice characteristics information and described sign information, according to machine The corresponding relation of broadcasting speed in speech database is updated by device learning algorithm.
Optionally, broadcasting speed adjustment module 400 is cut described voice messaging specifically by interpolation or take out Digital signal resampling, the time scale regulating described voice messaging reaches the mould of described broadcasting speed Block.
Wherein, based on above-mentioned any embodiment, this terminal is specifically as follows mobile phone.
In description, each embodiment uses the mode gone forward one by one to describe, and what each embodiment stressed is With the difference of other embodiments, between each embodiment, identical similar portion sees mutually.Right For device disclosed in embodiment, owing to it corresponds to the method disclosed in Example, so describe Fairly simple, relevant part sees method part and illustrates.
Professional further appreciates that, respectively shows in conjunction with what the embodiments described herein described The unit of example and algorithm steps, it is possible to electronic hardware, computer software or the two be implemented in combination in, In order to clearly demonstrate the interchangeability of hardware and software, the most general according to function Describe composition and the step of each example.These functions perform with hardware or software mode actually, Depend on application-specific and the design constraint of technical scheme.Professional and technical personnel can be to each specific Should be used for use different methods to realize described function, but this realization is it is not considered that beyond this The scope of invention.
The method described in conjunction with the embodiments described herein or the step of algorithm can directly use hardware, The software module that processor performs, or the combination of the two implements.Software module can be placed in and deposit at random Reservoir (RAM), internal memory, read only memory (ROM), electrically programmable ROM, electric erasable can be compiled Appointing well known in journey ROM, depositor, hard disk, moveable magnetic disc, CD-ROM or technical field In the storage medium of other form of anticipating.
The method and the terminal that are automatically adjusted word speed provided by the present invention above are described in detail.This Literary composition applies specific case principle and the embodiment of the present invention are set forth, above example Method and the core concept thereof being only intended to help to understand the present invention is described.It should be pointed out that, for this technology For the those of ordinary skill in field, under the premise without departing from the principles of the invention, it is also possible to the present invention Carrying out some improvement and modification, these improve and modify in the protection domain also falling into the claims in the present invention.

Claims (10)

1. the method that a word speed is automatically adjusted, it is characterised in that including:
Obtain the voice messaging of input;
Extract the voice characteristics information of described voice messaging;
The broadcasting of the described voice messaging corresponding with described voice characteristics information is inquired about from speech database Speed;
The speed that described voice messaging is play is regulated according to described broadcasting speed.
2. the method that word speed as claimed in claim 1 is automatically adjusted, it is characterised in that described extraction institute State the voice characteristics information of voice messaging, including:
Identify the languages characteristic information of described voice messaging;And/or,
Extract the word speed information of described voice messaging, in Feature Words information and audio-frequency information at least one.
3. the method that word speed as claimed in claim 1 or 2 is automatically adjusted, it is characterised in that institute's predicate Message breath is the voice messaging of this end subscriber, and the method also includes:
Obtain the sign information of described end subscriber;
The broadcasting of the described voice messaging corresponding with described voice characteristics information is inquired about from speech database Speed, including:
From speech database inquire about corresponding with described voice characteristics information and described sign information described in The broadcasting speed of voice messaging.
4. the method that word speed as claimed in claim 3 is automatically adjusted, it is characterised in that will be from voice number According to storehouse is inquired about broadcasting of the described voice messaging corresponding with described voice characteristics information and described sign information After putting speed, also include:
Utilize described voice characteristics information and described sign information, according to machine learning algorithm to speech data In storehouse, the corresponding relation of broadcasting speed is updated.
5. the method that word speed as claimed in claim 1 is automatically adjusted, it is characterised in that broadcast according to described Put speed and regulate the speed that described voice messaging is play, including:
By interpolation or take out and cut the digital signal resampling to described voice messaging, regulate described voice The time scale of information reaches described broadcasting speed.
6. a terminal, it is characterised in that including:
Voice messaging acquisition module, for obtaining the voice messaging of input;
Pronunciation extracting module, for extracting the voice characteristics information of described voice messaging;
Broadcasting speed determines module, relative with described voice characteristics information for inquiring about from speech database The broadcasting speed of the described voice messaging answered;
Broadcasting speed adjustment module, for regulating, according to described broadcasting speed, the speed that described voice messaging is play Degree.
7. terminal as claimed in claim 6, it is characterised in that described pronunciation extracting module includes:
First speech feature extraction unit, for identifying the languages characteristic information of described voice messaging;And/or,
Second speech feature extraction unit, for extracting the word speed information of described voice messaging, Feature Words is believed Breath and audio-frequency information at least one.
Terminal the most as claimed in claims 6 or 7, it is characterised in that described voice messaging is that local terminal is used The voice messaging at family, this terminal also includes:
Sign information acquisition module, for obtaining the sign information of described end subscriber.
9. terminal as claimed in claim 8, it is characterised in that also include:
Machine learning module, is used for utilizing described voice characteristics information and described sign information, according to machine The corresponding relation of broadcasting speed in speech database is updated by learning algorithm.
10. terminal as claimed in claim 6, it is characterised in that described broadcasting speed adjustment module has Body is by interpolation or to take out and cut the digital signal resampling to described voice messaging, regulates described voice The time scale of information reaches the module of described broadcasting speed.
CN201610375868.9A 2016-05-31 2016-05-31 A kind of method and terminal of word speed automatic adjustment Active CN105869626B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201610375868.9A CN105869626B (en) 2016-05-31 2016-05-31 A kind of method and terminal of word speed automatic adjustment
PCT/CN2016/087741 WO2017206256A1 (en) 2016-05-31 2016-06-29 Method for automatically adjusting speaking speed and terminal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610375868.9A CN105869626B (en) 2016-05-31 2016-05-31 A kind of method and terminal of word speed automatic adjustment

Publications (2)

Publication Number Publication Date
CN105869626A true CN105869626A (en) 2016-08-17
CN105869626B CN105869626B (en) 2019-02-05

Family

ID=56643245

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610375868.9A Active CN105869626B (en) 2016-05-31 2016-05-31 A kind of method and terminal of word speed automatic adjustment

Country Status (2)

Country Link
CN (1) CN105869626B (en)
WO (1) WO2017206256A1 (en)

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106448653A (en) * 2016-09-27 2017-02-22 惠州市德赛工业研究院有限公司 Wearable intelligent terminal
CN106486111A (en) * 2016-10-14 2017-03-08 北京光年无限科技有限公司 Many tts engines output word speed control method and system based on intelligent robot
CN106534964A (en) * 2016-11-23 2017-03-22 广东小天才科技有限公司 Speed adjusting method and device
CN107689229A (en) * 2017-09-25 2018-02-13 广东小天才科技有限公司 A kind of method of speech processing and device for wearable device
CN108630224A (en) * 2018-03-22 2018-10-09 北京云知声信息技术有限公司 Control the method and device of word speed
CN108984078A (en) * 2017-05-31 2018-12-11 联想(新加坡)私人有限公司 The method and information processing unit of output setting are adjusted based on the user identified
CN109119088A (en) * 2018-08-29 2019-01-01 歌尔科技有限公司 A kind of adjusting method of audio signal, device, equipment and computer storage medium
CN109147802A (en) * 2018-10-22 2019-01-04 珠海格力电器股份有限公司 A kind of broadcasting word speed adjusting method and device
CN109348068A (en) * 2018-12-03 2019-02-15 咪咕数字传媒有限公司 A kind of information processing method, device and storage medium
CN109521718A (en) * 2019-01-11 2019-03-26 深圳汉尼康科技有限公司 Electronic speech device and control method
CN109582275A (en) * 2018-12-03 2019-04-05 珠海格力电器股份有限公司 Voice regulation method, device, storage medium and electronic device
CN109979474A (en) * 2019-03-01 2019-07-05 珠海格力电器股份有限公司 Speech ciphering equipment and its user speed modification method, device and storage medium
CN110798327A (en) * 2019-09-04 2020-02-14 腾讯科技(深圳)有限公司 Message processing method, device and storage medium
CN111031386A (en) * 2019-12-17 2020-04-17 腾讯科技(深圳)有限公司 Video dubbing method and device based on voice synthesis, computer equipment and medium
CN111292737A (en) * 2018-12-07 2020-06-16 阿里巴巴集团控股有限公司 Voice interaction and voice awakening detection method, device, equipment and storage medium
CN112185363A (en) * 2020-10-21 2021-01-05 北京猿力未来科技有限公司 Audio processing method and device
CN112185403A (en) * 2020-09-07 2021-01-05 广州多益网络股份有限公司 Voice signal processing method and device, storage medium and terminal equipment
CN112423019A (en) * 2020-11-17 2021-02-26 北京达佳互联信息技术有限公司 Method and device for adjusting audio playing speed, electronic equipment and storage medium
CN112565880A (en) * 2020-12-28 2021-03-26 北京五街科技有限公司 Method for playing explanation videos
CN112565881A (en) * 2020-12-28 2021-03-26 北京五街科技有限公司 Self-adaptive video playing method
CN112750456A (en) * 2020-09-11 2021-05-04 腾讯科技(深圳)有限公司 Voice data processing method and device in instant messaging application and electronic equipment
CN112820289A (en) * 2020-12-31 2021-05-18 广东美的厨房电器制造有限公司 Voice playing method, voice playing system, electric appliance and readable storage medium

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112750436B (en) * 2020-12-29 2022-12-30 上海掌门科技有限公司 Method and equipment for determining target playing speed of voice message
CN113470617A (en) * 2021-06-28 2021-10-01 科大讯飞股份有限公司 Speech recognition method, electronic device and storage device
CN114979798B (en) * 2022-04-21 2024-03-22 维沃移动通信有限公司 Playing speed control method and electronic equipment

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070177633A1 (en) * 2006-01-30 2007-08-02 Inventec Multimedia & Telecom Corporation Voice speed adjusting system of voice over Internet protocol (VoIP) phone and method therefor
CN101427314A (en) * 2006-04-25 2009-05-06 英特尔公司 Method and apparatus for automatic adjustment of play speed of audio data
CN101860617A (en) * 2009-04-12 2010-10-13 比亚迪股份有限公司 Mobile terminal with voice processing effect and method thereof
JP2011087196A (en) * 2009-10-16 2011-04-28 Nec Saitama Ltd Telephone set, and speech speed conversion method of telephone set
JP2015184349A (en) * 2014-03-20 2015-10-22 日本放送協会 Voice signal processing device and program
CN105405439A (en) * 2015-11-04 2016-03-16 科大讯飞股份有限公司 Voice playing method and device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070177633A1 (en) * 2006-01-30 2007-08-02 Inventec Multimedia & Telecom Corporation Voice speed adjusting system of voice over Internet protocol (VoIP) phone and method therefor
CN101427314A (en) * 2006-04-25 2009-05-06 英特尔公司 Method and apparatus for automatic adjustment of play speed of audio data
CN101860617A (en) * 2009-04-12 2010-10-13 比亚迪股份有限公司 Mobile terminal with voice processing effect and method thereof
JP2011087196A (en) * 2009-10-16 2011-04-28 Nec Saitama Ltd Telephone set, and speech speed conversion method of telephone set
JP2015184349A (en) * 2014-03-20 2015-10-22 日本放送協会 Voice signal processing device and program
CN105405439A (en) * 2015-11-04 2016-03-16 科大讯飞股份有限公司 Voice playing method and device

Cited By (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106448653A (en) * 2016-09-27 2017-02-22 惠州市德赛工业研究院有限公司 Wearable intelligent terminal
CN106486111A (en) * 2016-10-14 2017-03-08 北京光年无限科技有限公司 Many tts engines output word speed control method and system based on intelligent robot
CN106534964B (en) * 2016-11-23 2020-02-14 广东小天才科技有限公司 Method and device for adjusting speech rate
CN106534964A (en) * 2016-11-23 2017-03-22 广东小天才科技有限公司 Speed adjusting method and device
CN108984078A (en) * 2017-05-31 2018-12-11 联想(新加坡)私人有限公司 The method and information processing unit of output setting are adjusted based on the user identified
CN107689229A (en) * 2017-09-25 2018-02-13 广东小天才科技有限公司 A kind of method of speech processing and device for wearable device
CN108630224A (en) * 2018-03-22 2018-10-09 北京云知声信息技术有限公司 Control the method and device of word speed
CN108630224B (en) * 2018-03-22 2020-06-09 云知声智能科技股份有限公司 Method and device for controlling speech rate
CN109119088A (en) * 2018-08-29 2019-01-01 歌尔科技有限公司 A kind of adjusting method of audio signal, device, equipment and computer storage medium
CN109147802A (en) * 2018-10-22 2019-01-04 珠海格力电器股份有限公司 A kind of broadcasting word speed adjusting method and device
CN109348068A (en) * 2018-12-03 2019-02-15 咪咕数字传媒有限公司 A kind of information processing method, device and storage medium
CN109582275A (en) * 2018-12-03 2019-04-05 珠海格力电器股份有限公司 Voice regulation method, device, storage medium and electronic device
CN111292737A (en) * 2018-12-07 2020-06-16 阿里巴巴集团控股有限公司 Voice interaction and voice awakening detection method, device, equipment and storage medium
CN109521718A (en) * 2019-01-11 2019-03-26 深圳汉尼康科技有限公司 Electronic speech device and control method
CN109979474A (en) * 2019-03-01 2019-07-05 珠海格力电器股份有限公司 Speech ciphering equipment and its user speed modification method, device and storage medium
CN109979474B (en) * 2019-03-01 2021-04-13 珠海格力电器股份有限公司 Voice equipment and user speech rate correction method and device thereof and storage medium
CN110798327A (en) * 2019-09-04 2020-02-14 腾讯科技(深圳)有限公司 Message processing method, device and storage medium
CN111031386A (en) * 2019-12-17 2020-04-17 腾讯科技(深圳)有限公司 Video dubbing method and device based on voice synthesis, computer equipment and medium
CN111031386B (en) * 2019-12-17 2021-07-30 腾讯科技(深圳)有限公司 Video dubbing method and device based on voice synthesis, computer equipment and medium
CN112185403A (en) * 2020-09-07 2021-01-05 广州多益网络股份有限公司 Voice signal processing method and device, storage medium and terminal equipment
CN112750456A (en) * 2020-09-11 2021-05-04 腾讯科技(深圳)有限公司 Voice data processing method and device in instant messaging application and electronic equipment
CN112185363A (en) * 2020-10-21 2021-01-05 北京猿力未来科技有限公司 Audio processing method and device
CN112185363B (en) * 2020-10-21 2024-02-13 北京猿力未来科技有限公司 Audio processing method and device
CN112423019A (en) * 2020-11-17 2021-02-26 北京达佳互联信息技术有限公司 Method and device for adjusting audio playing speed, electronic equipment and storage medium
CN112423019B (en) * 2020-11-17 2022-11-22 北京达佳互联信息技术有限公司 Method and device for adjusting audio playing speed, electronic equipment and storage medium
CN112565880A (en) * 2020-12-28 2021-03-26 北京五街科技有限公司 Method for playing explanation videos
CN112565881A (en) * 2020-12-28 2021-03-26 北京五街科技有限公司 Self-adaptive video playing method
CN112565881B (en) * 2020-12-28 2023-03-24 北京五街科技有限公司 Self-adaptive video playing method and system
CN112565880B (en) * 2020-12-28 2023-03-24 北京五街科技有限公司 Method and system for playing explanation videos
CN112820289A (en) * 2020-12-31 2021-05-18 广东美的厨房电器制造有限公司 Voice playing method, voice playing system, electric appliance and readable storage medium

Also Published As

Publication number Publication date
WO2017206256A1 (en) 2017-12-07
CN105869626B (en) 2019-02-05

Similar Documents

Publication Publication Date Title
CN105869626A (en) Automatic speech rate adjusting method and terminal
CN103903627B (en) The transmission method and device of a kind of voice data
CN109979457A (en) A method of thousand people, thousand face applied to Intelligent dialogue robot
CN104538043A (en) Real-time emotion reminder for call
US7792673B2 (en) Method of generating a prosodic model for adjusting speech style and apparatus and method of synthesizing conversational speech using the same
JP5507260B2 (en) System and technique for creating spoken voice prompts
CN105991847A (en) Call communication method and electronic device
CN108184032B (en) Service method and device of customer service system
CN104811559A (en) Noise reduction method, communication method and mobile terminal
CN109599094A (en) The method of sound beauty and emotion modification
EP1280137B1 (en) Method for speaker identification
CN106981289A (en) A kind of identification model training method and system and intelligent terminal
CN104485100A (en) Text-to-speech pronunciation person self-adaptive method and system
CN106887231A (en) A kind of identification model update method and system and intelligent terminal
CN107910004A (en) Voiced translation processing method and processing device
CN110198381A (en) A kind of method and device of identification AI incoming call
Babel et al. 19 Producing Linguistic Variation Socially Meaningful
CN112349266A (en) Voice editing method and related equipment
CN101460994A (en) Speech differentiation
CN104427125A (en) Method and mobile terminal for answering call
CN113643684A (en) Speech synthesis method, speech synthesis device, electronic equipment and storage medium
Hempel et al. Sound branding and corporate voice–strategic brand management using sound
CN109616116B (en) Communication system and communication method thereof
CN112102807A (en) Speech synthesis method, apparatus, computer device and storage medium
CN112185341A (en) Dubbing method, apparatus, device and storage medium based on speech synthesis

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant