CN104078076B - A kind of voice typing method and system - Google Patents

A kind of voice typing method and system Download PDF

Info

Publication number
CN104078076B
CN104078076B CN201410265393.9A CN201410265393A CN104078076B CN 104078076 B CN104078076 B CN 104078076B CN 201410265393 A CN201410265393 A CN 201410265393A CN 104078076 B CN104078076 B CN 104078076B
Authority
CN
China
Prior art keywords
voice
time
automatically
current
typing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410265393.9A
Other languages
Chinese (zh)
Other versions
CN104078076A (en
Inventor
潘青华
钱柄桦
何婷婷
王智国
胡郁
刘庆峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
iFlytek Co Ltd
Original Assignee
iFlytek Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by iFlytek Co Ltd filed Critical iFlytek Co Ltd
Priority to CN201410265393.9A priority Critical patent/CN104078076B/en
Publication of CN104078076A publication Critical patent/CN104078076A/en
Application granted granted Critical
Publication of CN104078076B publication Critical patent/CN104078076B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Electrically Operated Instructional Devices (AREA)
  • Electric Clocks (AREA)

Abstract

The invention discloses a kind of voice typing method and system, belong to voice typing technical field.The voice input method includes:Audio signal during real-time reception user speech typing;Carry out end-point detection to the audio signal, and determine according to testing result whether the voice in the audio signal seizes up state;If it is, endpoint time is calculated according to predetermined period, and end points information is shown to user according to result of calculation, until this pause terminates;The endpoint time includes:The remaining time that current time terminates automatically to current speech clause.The voice typing method and system, can effectively improve voice typing quality, and then improve the accuracy of speech recognition.

Description

A kind of voice typing method and system
Technical field
The present invention relates to voice typing technical field, more particularly to a kind of voice typing method and system.
Background technology
Through technology development for many years, voice typing is as a kind of important non-keyboard input method in PC, smart mobile phone Deng being widely used on portable equipment.Under normal circumstances, speech recognition system is after the voice for obtaining user's typing, right Voice signal carries out decoding and obtains text word string, then feeds back to user.And the accuracy rate and the matter of voice typing of speech recognition Amount has much relations.Under normal circumstances, standard is got in the accent of typing voice, and speed is more steady, pauses more accurate, and volume is more suitable In, then voice quality is higher, and correspondingly the accuracy rate of speech recognition is also higher.
As shown in figure 1, for the flow chart of voice input method of the prior art.
Voice input method of the prior art, generally includes following steps:
Step 101:After receiving the recording enabled instruction of user, start audio frequency letter during real-time reception user speech typing Number.
Wherein, enabled instruction of recording is usually trigger of the user to start button of recording, and manually can press Start button, proceeds by recording.
Step 102:Speech analysis is carried out to audio signal, and shows analysis result to user.
Wherein, speech analysis is carried out to audio signal, mainly include (sound being can indicate that to speech volume or signal amplitude Height) be analyzed, the height of speech volume is represented using the number of the energy bar number on indicator, so that user The height of volume can be controlled in typing voice.
Step 103:If the End of Tape for receiving user is indicated, stop voice typing, otherwise proceed voice Typing.
Wherein, End of Tape instruction is usually trigger of the user to End of Tape button, manually can press Conclusion button, stops voice typing.Whether recording can certainly be terminated to carry out automatically by preset endpoint detection module Judge.
Voice input method of the prior art, as volume relevant information, root are generally only included in result of voice analysis The height of voice typing volume can only be adjusted according to analysis result, and uncontrollable voice input speed, also not knowing should Should when paused, it is easy to cause voice typing poor quality because voice input speed is improper, so as to cannot Carry out speech recognition or recognition accuracy is relatively low.
The content of the invention
The purpose of the embodiment of the present invention is to provide a kind of voice typing method and system, can effectively improve voice typing Quality, and then improve the accuracy of speech recognition.
Technical scheme provided in an embodiment of the present invention is as follows:
On the one hand, there is provided a kind of voice input method, including:
Audio signal during real-time reception user speech typing;
Carry out end-point detection to the audio signal, and determine voice in the audio signal whether according to testing result Seize up state;
If it is, endpoint time is calculated according to predetermined period, and end points prompting letter is shown to user according to result of calculation Breath, until this pause terminates;The endpoint time includes:Current time to current speech clause terminate automatically it is remaining when Between.
Preferably, the endpoint time also includes:The remaining time that current time terminates automatically to this voice typing.
Preferably, it is described to include according to predetermined period calculating endpoint time:Calculate current time to current speech clause oneself The remaining time that the dynamic remaining time for terminating and current time terminate automatically to this voice typing;
The remaining time that the calculating current time terminates automatically to current speech clause, including:When acquisition first is default First preset duration is deducted described this voice signal pause institute by duration long and that this voice signal pause is lasting Lasting duration obtains the remaining time that the current time terminates automatically to current speech clause;
The remaining time that the calculating current time terminates automatically to this voice typing, including:When acquisition second is default Second preset duration is deducted described this voice signal pause institute by duration long and that this voice signal pause is lasting Lasting duration obtains the remaining time that the current time terminates automatically to this voice typing;
First preset duration is the minimum interval between voice clause;Second preset duration is to detect language The time that the end caps of sound terminate automatically to this voice typing.
Preferably, it is described that end points information is shown to user according to result of calculation, until this pause end includes:
If the remaining time that the current time terminates automatically to current speech clause and current time are to this voice The remaining time that typing terminates automatically is both greater than zero, then show that the current time terminates automatically to current speech clause to user Remaining time and remaining time for terminating to this voice typing automatically at current time;
If the remaining time that the current time terminates automatically to current speech clause is less than or equal to zero, and described works as Front moment to the remaining time that this voice typing terminates automatically is more than zero, then show that voice clause terminates prompting letter to user Breath, and show the remaining time that current time terminates automatically to this voice typing to user;
If the current time to the remaining time that this voice typing terminates automatically is less than or equal to zero, to user Show that this voice typing terminates information automatically.
Preferably, it is described to show that end points information includes to user:
Carried to user's displaying end points using any one or more mode in digital diagram, progress bar, prompt tone this three Show information.
On the other hand, there is provided a kind of voice input system, including:
Receiver module, for audio signal during real-time reception user speech typing;
Endpoint detection module, for carrying out end-point detection to the audio signal;
Determining module, for determining that according to the testing result of the endpoint detection module voice in the audio signal is The no state that seizes up;
Computing module, for determining that in the determining module voice in the audio signal seizes up after state, presses Endpoint time is calculated according to predetermined period;The endpoint time includes:The residue that current time terminates automatically to current speech clause Time;
Display module, shows end points information, Zhi Daoben to user for the result of calculation according to the computing module Secondary pause terminates.
Preferably, the endpoint time also includes:The remaining time that current time terminates automatically to this voice typing.
Preferably, the computing module includes:
First computing unit, for determining that in the determining module voice in the audio signal seizes up state Afterwards, according to the remaining time that predetermined period calculating current time terminates automatically to current speech clause, including:Obtain first to preset First preset duration is deducted described this voice signal and is paused by the lasting duration of duration and this voice signal pause Lasting duration obtain the remaining time that the current time terminates automatically to current speech clause, first preset duration For the minimum interval between voice clause;
Second computing unit, for determining that in the determining module voice in the audio signal seizes up state Afterwards, according to the remaining time that predetermined period calculating current time terminates automatically to this voice typing, including:Obtain second to preset Second preset duration is deducted described this voice signal and is paused by the lasting duration of duration and this voice signal pause Lasting duration obtain the remaining time that the current time terminates automatically to this voice typing, second preset duration To detect the time that the end caps of voice terminate automatically to this voice typing.
Preferably, the display module, it is surplus specifically for what is terminated at the current time to current speech clause automatically During remaining time both greater than zero that remaining time and current time terminate automatically to this voice typing, show to user described current Remaining time that moment terminates automatically to current speech clause and current time to this voice typing terminate automatically it is remaining when Between;Automatically the remaining time terminated at the current time to current speech clause is less than or equal to zero, and the current time Automatically, when the remaining time terminated to this voice typing is more than zero, show that voice clause terminates information to user, and to The remaining time that user's displaying current time terminates automatically to this voice typing;At the current time to this voice When the remaining time that typing terminates automatically is less than or equal to zero, show that this voice typing terminates information automatically to user.
Preferably, the display module, specifically for adopting digital diagram, progress bar, arbitrary in prompt tone this three Plant or various ways show end points information to user.
By end-point detection, voice typing method and system provided in an embodiment of the present invention, determine whether voice signal is in Standstill state, when voice signal seizes up state, by showing end points information to user, allows users to know to work as The remaining time that the front moment terminates automatically to current speech clause, so as to be adjusted voice input speed, and select closing The suitable moment is just paused, and can effectively be lifted voice typing quality, and then be improved the accuracy rate of speech recognition.
Description of the drawings
In order to be illustrated more clearly that the embodiment of the present invention or technical scheme of the prior art, below will be to institute in embodiment The accompanying drawing that needs are used is briefly described, it should be apparent that, drawings in the following description are only described in the present invention A little embodiments, for those of ordinary skill in the art, can be with according to these other accompanying drawings of accompanying drawings acquisition.
Fig. 1 is the flow chart of voice input method of the prior art;
Fig. 2 is the flow chart of voice input method provided in an embodiment of the present invention;
Fig. 3 is a kind of structural representation of voice input system provided in an embodiment of the present invention;
Fig. 4 is another kind of structural representation of voice input system provided in an embodiment of the present invention.
Specific embodiment
In order to be illustrated more clearly that the embodiment of the present application or technical scheme of the prior art, below will be to institute in embodiment The accompanying drawing that needs are used is briefly described, it should be apparent that, drawings in the following description are only described in the present invention A little embodiments, for those of ordinary skill in the art, can be with according to these other accompanying drawings of accompanying drawings acquisition.
The embodiment of the present invention provides a kind of voice typing method and system, by showing end points information to user, makes User can be adjusted to voice input speed, and rationally control voice pause moment and pause duration, so as to have Effect improves voice typing quality, and then is improved the accuracy of speech recognition.
As shown in Fig. 2 for a kind of flow chart of voice input method provided in an embodiment of the present invention, comprising the following steps:
Step 201:Audio signal during real-time reception user speech typing.
Step 202:Carry out end-point detection to audio signal, and determine voice in audio signal whether according to testing result Seize up state.
As the voice signal in audio signal presents short-term stationarity feature, can be by doing framing to audio signal Process, by whole audio segmentation into length-specific subsegment, so as to ensure the spectral continuity of subsegment audio frequency.At each energy The limited length of the audio signal of reason, in addition it is also necessary to which windowing process is done to audio signal, so that audio signal handled every time The signal being limited in window.Can specifically adopt plus the windowing process such as Hamming window or Hanning window.Preferably, every frame length of subsegment audio frequency Spend for 25ms, frame is moved as 10ms.For the audio frequency of one section of length-specific, after framing and windowing process, can obtain multiple Speech frame.Wherein, speech frame is the minimum unit of voice and non-voice judgement in audio signal.
End-point detection is essentially by the characteristic information in each resulting speech frame, for example, time domain energy, frequency domain Energy or zero-crossing rate etc. are calculated, so as to make a distinction to voice and non-voice, wherein, non-voice can both be quiet, go back It can be noise.As to the audio signal under quiet environment, voice segments energy is generally high than non-speech segment energy, voice signal Zero-crossing rate it is generally low than the zero-crossing rate of non-speech audio, wherein, zero-crossing rate refers to the sampled audio signal value within the unit interval By the number of times of zero point (change from positive to negative or be just changed into from negative).By the calculating to features above information, can effectively to language Sound and non-voice make a distinction, such that it is able to judge that current audio signal is voice signal or non-speech audio.Work as judgement When current audio signal is non-speech audio, it is believed that the voice in audio signal seizes up state, therefore, by end Point detection can effectively recognize the beginning end points and end caps of voice in audio signal.
Step 203:If it is, endpoint time is calculated according to predetermined period, and end is shown to user according to result of calculation Point information, until this pause terminates.
If according to end-point detection result, determining that the voice in audio signal does not occur to pause, it is also possible to according to default Cycle does not pause information to user feedback voice signal, so that user knows that voice signal is not sent out after the information is seen It is raw to pause.
When voice signal pause certain hour is detected, terminating standstill state, to proceed Speech Record fashionable, can make end The point time recovers default value (for example resetting), when voice signal generation pause is detected again, calculates according still further to predetermined period The endpoint time of renewal.Wherein it is possible to determine whether user proceeds voice typing by above-mentioned end-point detection, if detection As a result show that voice signal terminates standstill state after pausing for a period of time, it is believed that user proceeds voice typing, otherwise, It is considered that voice signal is continuously in standstill state.
Above-mentioned endpoint time can include:The remaining time that current time terminates automatically to current speech clause, with M ms (millisecond) is represented.As data processing speed is fixed, when by the conversion of the data volume of each treatable audio signal can be Between length, represented with Kms, then from voice signal occur pause the moment start to pause to terminate, calculate and feed back at interval of K ms Once new endpoint time, while showing end points information to user.In embodiments of the present invention, for ease of description, can be with K is referred to as into feedback interval time or predetermined period.By calculating endpoint time M it is recognised that from current time, voice letter Number there is pause how long again, current speech clause is terminated automatically.
Above-mentioned endpoint time can also include:The remaining time that current time terminates automatically to this voice typing, with N Ms is represented.By calculating endpoint time N it is recognised that from current time, voice signal occurs pause how long again, This voice typing is terminated automatically.Preferably, N >=M.
In embodiments of the present invention, two time spans can be pre-set:First preset duration T1When default with second Long T2.Wherein, the first preset duration T1Minimum interval between finger speech phone sentence, the second preset duration T2Finger detects voice Time for terminating to this voice typing automatically of end caps, then have 0≤M≤T1, 0≤N≤T2.When voice signal stops After, duration length of pausing is with TsRepresent, if then pausing duration length TsMore than or equal to T1, Voice signal before and after then judgement pauses is in different voice clauses;If pausing duration length TsLess than T1, Voice signal before and after then judgement pauses is in same voice clause;If pausing duration length TsMore than or Equal to T2, then adjudicate this voice typing and terminate automatically.Preferably, can be by T1It is set to 300~400ms, T2It is set to 1000 ~2000ms, K are set to 50ms.
After occurring to pause due to voice signal, pause duration length T fed back for the first timesNot over feedback Interval time K, then obviously have Ts≤K.During due to first time feedback endpoint time, duration length of pausing is Ts, then The initial feedback value of M is M0=T1-Ts, the initial feedback value of N is N0=T2-Ts, hereafter, if voice signal is still within pausing State, then at interval of Kms, be handled as follows to M and N:Mi=Mi-1- K, Ni=Ni-1-K。
It is above-mentioned to include according to predetermined period calculating endpoint time:Calculate what current time terminated automatically to current speech clause The remaining time N that remaining time M and current time terminate automatically to this voice typing;Wherein, current time is to current speech The remaining time M that clause terminates automatically, can pass through the first preset duration T1When deducting this voice signal and pausing lasting Long TsIt is calculated;The remaining time N that current time terminates automatically to this voice typing, can pass through the second preset duration T2 Deduct the lasting duration T of this voice signal pausesIt is calculated.
Wherein, end points information is shown to user according to the result of calculation of endpoint time, until this pause terminates master To include following several situations:
(1)Mi> 0, Ni> 0, then the end points information for showing to user include MiAnd NiValue.
The remaining time M that current time terminates automatically to current speech clausei> 0, it is believed that voice signal is still located In standstill state, and there is no the judgement that current speech clause terminates automatically;Current time is automatic to this voice typing The remaining time N of endi> 0, it is believed that voice signal is still within standstill state, and without this voice typing of generation Automatically the judgement for terminating.Now, by showing M to useriAnd NiValue, it is possible to use family is intuitive to see Current speech clause is terminated automatically, and also remains how long this voice typing is terminated automatically, so that user is to language Sound input speed, speech pause moment and pause duration are controlled.
(2)Mi≤ 0, Ni> 0, then the end points information for showing to user include that voice clause terminates information and Ni Value.
The remaining time M that current time terminates automatically to current speech clausei≤ 0, it is believed that voice signal is still located In standstill state, but pause duration is more than or equal to the minimum interval T between voice clause1, have occurred and that voice Sentence terminates judgement;The remaining time N that current time terminates automatically to this voice typingi> 0, it is believed that voice signal is still Seize up state, and does not have the judgement that this voice typing terminates automatically.At this point it is possible to show voice to user Sentence terminates information, and the remaining time terminated to this voice typing automatically at displaying current time to user, it is possible to use Family is intuitive to see, so that user is to voice input speed, voice Pause moment and pause duration are controlled.
(3)Ni≤ 0, the end points information shown to user includes that this voice typing terminates information automatically.
If the remaining time N that current time terminates automatically to this voice typingi≤ 0, it is believed that voice signal is still So seize up state, and has occurred and that the judgement that this voice typing terminates automatically.At this point it is possible to show this to user Voice typing terminates information automatically, so that user is to voice input speed, speech pause moment and pause duration It is controlled.It should be noted that after this voice typing terminates automatically, when can not calculate end points according still further to predetermined period Between, can after voice typing is restarted, until detect again voice signal seize up state when, according still further to default week Phase calculates endpoint time.
Show that to user the mode of end points information is varied, can be configured as needed, for example, can adopt With any one or more mode in digital diagram, progress bar, prompt tone this three to user's displaying end points information, so as to Allow users to intuitively understand recording state, when in time to voice input speed, speech pause moment and pauses last Between be adjusted, so as to obtain high-quality recording, and then improve speech recognition accuracy.
Below by way of a specific example, the technical scheme of the embodiment of the present invention is described in detail.
For example, the audio signal of user institute typing is:Today // weather very well // I prepare to go for an outing //.Wherein, " // " Position represents voice signal to be occurred to pause.Hypothesis " today " and " weather " intermediate hold duration be 200ms, " fine " " I " intermediate hold duration is 500ms, " outing " subsequent user holding pause 1500ms.So, it is firm in user After finishing " today ", start pause, now, M=T1=400ms, N=T2=1200ms.Then through the pause of 200ms, M is reduced to 200ms, represents that also needing pause 200ms just adjudicate " today " this voice clause terminates, and N is reduced to 1000ms, table Show.But, as user terminates to pause, start " my god Gas ", i.e. M and N are not all reduced to 0, M and N and are restored to original default value (default value can be set to 0) until " very It is good " finish and pause, now, M=T1=400ms, N=T2=1200ms, then intermediate hold 500ms, arrives in pause During 400ms, M is reduced to 0, " today, weather was fine " this voice clause occurs and terminates judgement, but at the end of 500ms pauses N=700ms, is not still reduced to 0, so, there is no this voice typing and terminate automatically judgement." I prepares to go for an outing " is finished Generation pause 1500ms, when 400ms is paused, M is reduced to 0, " I prepares to go for an outing " this voice clause occurs and terminates to sentence Certainly, when 1200ms is paused, N is reduced to 0, this voice typing occurs and terminates automatically judgement, even if user continues to speak Cannot typing voice.
By end-point detection, voice input method provided in an embodiment of the present invention, determines whether voice signal seizes up shape State, when voice signal seizes up state, by showing end points information to user, allows users to know current time To the remaining time that current speech clause terminates automatically, so as to be adjusted to voice input speed, and select when suitable Quarter is just paused, and can effectively be lifted voice typing quality, and then be improved the accuracy rate of speech recognition.
Correspondingly, the embodiment of the present invention additionally provides a kind of voice input system, and its structural representation was as shown in figure 3, should Voice input system includes:
Receiver module 301, for audio signal during real-time reception user speech typing;
Endpoint detection module 302, for carrying out end-point detection to audio signal;
Whether determining module 303, the voice for being determined according to the testing result of endpoint detection module in audio signal are located In standstill state;
Computing module 304, for after the voice that determining module is determined in audio signal seizes up state, according to default Computation of Period endpoint time;Wherein, endpoint time includes:The remaining time that current time terminates automatically to current speech clause;
Display module 305, shows end points information to user for the result of calculation according to computing module, until this Pause terminates.
Further, above-mentioned endpoint time can also include:The residue that current time terminates automatically to this voice typing Time.
As shown in figure 4, above-mentioned computing module 304 can include:
First computing unit 401, for after the voice that determining module is determined in audio signal seizes up state, according to The remaining time that predetermined period calculating current time terminates automatically to current speech clause, including:Obtain the first preset duration and This voice signal pauses lasting duration, the first preset duration is deducted when this voice signal pauses lasting and is growed To the remaining time that current time terminates automatically to current speech clause, the first preset duration is the minimum time between voice clause Interval;
Second computing unit 402, for after the voice that determining module is determined in audio signal seizes up state, according to The remaining time that predetermined period calculating current time terminates automatically to this voice typing, including:Obtain the second preset duration and This voice signal pauses lasting duration, the second preset duration is deducted when this voice signal pauses lasting and is growed To the remaining time that current time terminates automatically to this voice typing, the second preset duration is the end caps for detecting voice To the time that this voice typing terminates automatically.
Wherein, display module 305, specifically for remaining time for terminating at current time to current speech clause automatically and During remaining time both greater than zero that current time terminates automatically to this voice typing, show current time to current language to user The remaining time that the remaining time and current time that phone sentence terminates automatically terminates automatically to this voice typing;At current time Automatically the remaining time terminated to current speech clause is less than or equal to zero, and current time terminates automatically to this voice typing Remaining time when being more than zero, show that voice clause terminates information to user, and show that current time arrives this to user The remaining time that voice typing terminates automatically;Automatically the remaining time terminated at current time to this voice typing is less than or waits When zero, show that this voice typing terminates information automatically to user.
Above-mentioned display module 305, specifically for adopting digital diagram, progress bar, in prompt tone this three any one or Various ways show end points information to user.
By end-point detection, voice input system provided in an embodiment of the present invention, determines whether voice signal seizes up shape State, when voice signal seizes up state, by showing end points information to user, allows users to know current time To the remaining time that current speech clause terminates automatically, so as to be adjusted to voice input speed, and select when suitable Quarter is just paused, and can effectively be lifted voice typing quality, and then be improved the accuracy rate of speech recognition.
Each embodiment in this specification is described by the way of progressive, identical similar portion between each embodiment Divide mutually referring to what each embodiment was stressed is the difference with other embodiment.Especially for system reality For applying example, as which is substantially similar to embodiment of the method, so describing fairly simple, related part is referring to embodiment of the method Part explanation.System embodiment described above is only schematic, wherein described illustrate as separating component Unit can be or may not be physically separate, as the part that unit shows can be or may not be Physical location, you can local to be located at one, or can also be distributed on multiple NEs.Can be according to the actual needs Select some or all of module therein to realize the purpose of this embodiment scheme.Those of ordinary skill in the art are not paying In the case of creative work, you can to understand and implement.
The foregoing is only presently preferred embodiments of the present invention, not to limit the present invention, all spirit in the present invention and Within principle, any modification, equivalent substitution and improvements made etc. should be included within the scope of the present invention.

Claims (10)

1. a kind of voice input method, it is characterised in that include:
Audio signal during real-time reception user speech typing;
End-point detection is carried out to the audio signal, and determines according to testing result whether the voice in the audio signal is in Standstill state;
If it is, endpoint time is calculated according to predetermined period, and end points information is shown to user according to result of calculation, directly Terminate to this pause;The endpoint time includes:The remaining time that current time terminates automatically to current speech clause;Its In, calculating current time to the remaining time that current speech clause terminates automatically includes:Obtain the first preset duration and this language First preset duration is deducted the lasting duration of this voice signal pause by the lasting duration of message number pause The remaining time that the current time terminates automatically to current speech clause is obtained, first preset duration is between voice clause Minimum interval.
2. method according to claim 1, it is characterised in that the endpoint time also includes:Current time is to this language The remaining time that sound typing terminates automatically.
3. method according to claim 2, it is characterised in that calculate what current time terminated automatically to this voice typing Remaining time, including:The lasting duration of the second preset duration and this voice signal pause is obtained, when described second is preset Length deducts described this voice signal lasting duration that pauses and obtains the current time and terminate to this voice typing automatically Remaining time;
Second preset duration is to detect the time that the end caps of voice terminate automatically to this voice typing.
4. method according to claim 3, it is characterised in that described that end points prompting letter is shown to user according to result of calculation Breath, until this pause end includes:
If the remaining time that the current time terminates automatically to current speech clause and current time are to this voice typing Automatically the remaining time for terminating is both greater than zero, then show the current time to remaining that current speech clause terminates automatically to user The remaining time that remaining time and current time terminate automatically to this voice typing;
If the current time to the remaining time that current speech clause terminates automatically is less than or equal to zero, and when described current Remaining time that this voice typing terminates automatically is carved into more than zero, then shows that voice clause terminates information to user, and To the remaining time that user's displaying current time terminates automatically to this voice typing;
If the current time to the remaining time that this voice typing terminates automatically is less than or equal to zero, show to user This voice typing terminates information automatically.
5. the method according to any one of Claims 1-4, it is characterised in that described to show end points information to user Including:
Show that end points prompting believe to user using any one or more mode in digital diagram, progress bar, prompt tone this three Breath.
6. a kind of voice input system, it is characterised in that include:
Receiver module, for audio signal during real-time reception user speech typing;
Endpoint detection module, for carrying out end-point detection to the audio signal;
Whether determining module, the voice for being determined according to the testing result of the endpoint detection module in the audio signal are located In standstill state;
Computing module, for determining that in the determining module voice in the audio signal seizes up after state, according to pre- If computation of Period endpoint time;The endpoint time includes:The remaining time that current time terminates automatically to current speech clause;
Display module, shows end points information to user for the result of calculation according to the computing module, until this stops Pause and terminate;
The computing module includes:First computing unit, for the voice in the audio signal is determined in the determining module After the state that seizes up, remaining time for terminating to current speech clause automatically at current time is calculated according to predetermined period, including: Obtain the first preset duration and this voice signal pause lasting duration, by first preset duration deduct it is described this Pause lasting duration of voice signal obtains the remaining time that the current time terminates automatically to current speech clause, described First preset duration is the minimum interval between voice clause.
7. system according to claim 6, it is characterised in that the endpoint time also includes:Current time is to this language The remaining time that sound typing terminates automatically.
8. system according to claim 7, it is characterised in that the computing module also includes:
Second computing unit, for determining that in the determining module voice in the audio signal seizes up after state, presses Remaining time for terminating to this voice typing automatically at current time is calculated according to predetermined period, including:Obtain the second preset duration Second preset duration is deducted described this voice signal pause and is held by the duration lasting with this voice signal pause Continuous duration obtains the remaining time that the current time terminates automatically to this voice typing, and second preset duration is inspection Measure the time that the end caps of voice terminate automatically to this voice typing.
9. system according to claim 8, it is characterised in that:
The display module, specifically for remaining time for terminating at the current time to current speech clause automatically and current During remaining time both greater than zero that the moment terminates automatically to this voice typing, show the current time to current language to user The remaining time that the remaining time and current time that phone sentence terminates automatically terminates automatically to this voice typing;Described current Moment to the remaining time that current speech clause terminates automatically is less than or equal to zero, and the current time is to this voice typing Automatically, when the remaining time for terminating is more than zero, shows that voice clause terminates information to user, and show described working as to user The remaining time that the front moment terminates automatically to this voice typing;Automatically terminate to this voice typing at the current time When remaining time is less than or equal to zero, show that this voice typing terminates information automatically to user.
10. the system according to any one of claim 6 to 9, it is characterised in that:
The display module, specifically for adopting digital diagram, progress bar, any one or more mode in prompt tone this three Show end points information to user.
CN201410265393.9A 2014-06-13 2014-06-13 A kind of voice typing method and system Active CN104078076B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410265393.9A CN104078076B (en) 2014-06-13 2014-06-13 A kind of voice typing method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410265393.9A CN104078076B (en) 2014-06-13 2014-06-13 A kind of voice typing method and system

Publications (2)

Publication Number Publication Date
CN104078076A CN104078076A (en) 2014-10-01
CN104078076B true CN104078076B (en) 2017-04-05

Family

ID=51599290

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410265393.9A Active CN104078076B (en) 2014-06-13 2014-06-13 A kind of voice typing method and system

Country Status (1)

Country Link
CN (1) CN104078076B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105139868A (en) * 2015-07-28 2015-12-09 苏州宏展信息科技有限公司 speech frequency automatic compensation control method for recording pen
CN110875033A (en) * 2018-09-04 2020-03-10 蔚来汽车有限公司 Method, apparatus, and computer storage medium for determining a voice end point
CN109360551B (en) * 2018-10-25 2021-02-05 珠海格力电器股份有限公司 Voice recognition method and device
CN109859773A (en) * 2019-02-14 2019-06-07 北京儒博科技有限公司 A kind of method for recording of sound, device, storage medium and electronic equipment
CN110970054B (en) * 2019-11-06 2022-06-24 广州视源电子科技股份有限公司 Method and device for automatically stopping voice acquisition, terminal equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6873953B1 (en) * 2000-05-22 2005-03-29 Nuance Communications Prosody based endpoint detection
CN101308653A (en) * 2008-07-17 2008-11-19 安徽科大讯飞信息科技股份有限公司 End-point detecting method applied to speech identification system
CN101588415A (en) * 2009-06-29 2009-11-25 中国农业大学 Voice service method and voice service system
CN102231278A (en) * 2011-06-10 2011-11-02 安徽科大讯飞信息科技股份有限公司 Method and system for realizing automatic addition of punctuation marks in speech recognition
WO2012055113A1 (en) * 2010-10-29 2012-05-03 安徽科大讯飞信息科技股份有限公司 Method and system for endpoint automatic detection of audio record
CN103559907A (en) * 2013-10-25 2014-02-05 广州华多网络科技有限公司 Recording method, device and terminal

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6873953B1 (en) * 2000-05-22 2005-03-29 Nuance Communications Prosody based endpoint detection
CN101308653A (en) * 2008-07-17 2008-11-19 安徽科大讯飞信息科技股份有限公司 End-point detecting method applied to speech identification system
CN101588415A (en) * 2009-06-29 2009-11-25 中国农业大学 Voice service method and voice service system
WO2012055113A1 (en) * 2010-10-29 2012-05-03 安徽科大讯飞信息科技股份有限公司 Method and system for endpoint automatic detection of audio record
CN102231278A (en) * 2011-06-10 2011-11-02 安徽科大讯飞信息科技股份有限公司 Method and system for realizing automatic addition of punctuation marks in speech recognition
CN103559907A (en) * 2013-10-25 2014-02-05 广州华多网络科技有限公司 Recording method, device and terminal

Also Published As

Publication number Publication date
CN104078076A (en) 2014-10-01

Similar Documents

Publication Publication Date Title
CN104078076B (en) A kind of voice typing method and system
US8924216B2 (en) System and method for synchronizing sound and manually transcribed text
CN105244026B (en) A kind of method of speech processing and device
CN108847215B (en) Method and device for voice synthesis based on user timbre
CN102568478B (en) Video play control method and system based on voice recognition
JP2017078869A (en) Speech endpointing
CN110310623A (en) Sample generating method, model training method, device, medium and electronic equipment
EP3726524A1 (en) Speech endpointing
CN108172242B (en) Improved Bluetooth intelligent cloud sound box voice interaction endpoint detection method
CN109545197B (en) Voice instruction identification method and device and intelligent terminal
CN105895103A (en) Speech recognition method and device
GB1569450A (en) Speech recognition system
CN107491286A (en) Pronunciation inputting method, device, mobile terminal and the storage medium of mobile terminal
CN105139858A (en) Information processing method and electronic equipment
CN110047470A (en) A kind of sound end detecting method
US20220319538A1 (en) Voice interactive wakeup electronic device and method based on microphone signal, and medium
CN104240718A (en) Transcription support device, method, and computer program product
CN112133277B (en) Sample generation method and device
CN104318921A (en) Voice section segmentation detection method and system and spoken language detecting and evaluating method and system
CN104469487B (en) A kind of detection method and device of scene switching point
CN108039181A (en) The emotion information analysis method and device of a kind of voice signal
WO2018043138A1 (en) Information processing device, information processing method, and program
US11948567B2 (en) Electronic device and control method therefor
CN109994129A (en) Speech processing system, method and apparatus
CN106297795B (en) Audio recognition method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant