CN104078076B - A kind of voice typing method and system - Google Patents
A kind of voice typing method and system Download PDFInfo
- Publication number
- CN104078076B CN104078076B CN201410265393.9A CN201410265393A CN104078076B CN 104078076 B CN104078076 B CN 104078076B CN 201410265393 A CN201410265393 A CN 201410265393A CN 104078076 B CN104078076 B CN 104078076B
- Authority
- CN
- China
- Prior art keywords
- voice
- time
- automatically
- current
- typing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Landscapes
- Electrically Operated Instructional Devices (AREA)
- Electric Clocks (AREA)
Abstract
The invention discloses a kind of voice typing method and system, belong to voice typing technical field.The voice input method includes:Audio signal during real-time reception user speech typing;Carry out end-point detection to the audio signal, and determine according to testing result whether the voice in the audio signal seizes up state;If it is, endpoint time is calculated according to predetermined period, and end points information is shown to user according to result of calculation, until this pause terminates;The endpoint time includes:The remaining time that current time terminates automatically to current speech clause.The voice typing method and system, can effectively improve voice typing quality, and then improve the accuracy of speech recognition.
Description
Technical field
The present invention relates to voice typing technical field, more particularly to a kind of voice typing method and system.
Background technology
Through technology development for many years, voice typing is as a kind of important non-keyboard input method in PC, smart mobile phone
Deng being widely used on portable equipment.Under normal circumstances, speech recognition system is after the voice for obtaining user's typing, right
Voice signal carries out decoding and obtains text word string, then feeds back to user.And the accuracy rate and the matter of voice typing of speech recognition
Amount has much relations.Under normal circumstances, standard is got in the accent of typing voice, and speed is more steady, pauses more accurate, and volume is more suitable
In, then voice quality is higher, and correspondingly the accuracy rate of speech recognition is also higher.
As shown in figure 1, for the flow chart of voice input method of the prior art.
Voice input method of the prior art, generally includes following steps:
Step 101:After receiving the recording enabled instruction of user, start audio frequency letter during real-time reception user speech typing
Number.
Wherein, enabled instruction of recording is usually trigger of the user to start button of recording, and manually can press
Start button, proceeds by recording.
Step 102:Speech analysis is carried out to audio signal, and shows analysis result to user.
Wherein, speech analysis is carried out to audio signal, mainly include (sound being can indicate that to speech volume or signal amplitude
Height) be analyzed, the height of speech volume is represented using the number of the energy bar number on indicator, so that user
The height of volume can be controlled in typing voice.
Step 103:If the End of Tape for receiving user is indicated, stop voice typing, otherwise proceed voice
Typing.
Wherein, End of Tape instruction is usually trigger of the user to End of Tape button, manually can press
Conclusion button, stops voice typing.Whether recording can certainly be terminated to carry out automatically by preset endpoint detection module
Judge.
Voice input method of the prior art, as volume relevant information, root are generally only included in result of voice analysis
The height of voice typing volume can only be adjusted according to analysis result, and uncontrollable voice input speed, also not knowing should
Should when paused, it is easy to cause voice typing poor quality because voice input speed is improper, so as to cannot
Carry out speech recognition or recognition accuracy is relatively low.
The content of the invention
The purpose of the embodiment of the present invention is to provide a kind of voice typing method and system, can effectively improve voice typing
Quality, and then improve the accuracy of speech recognition.
Technical scheme provided in an embodiment of the present invention is as follows:
On the one hand, there is provided a kind of voice input method, including:
Audio signal during real-time reception user speech typing;
Carry out end-point detection to the audio signal, and determine voice in the audio signal whether according to testing result
Seize up state;
If it is, endpoint time is calculated according to predetermined period, and end points prompting letter is shown to user according to result of calculation
Breath, until this pause terminates;The endpoint time includes:Current time to current speech clause terminate automatically it is remaining when
Between.
Preferably, the endpoint time also includes:The remaining time that current time terminates automatically to this voice typing.
Preferably, it is described to include according to predetermined period calculating endpoint time:Calculate current time to current speech clause oneself
The remaining time that the dynamic remaining time for terminating and current time terminate automatically to this voice typing;
The remaining time that the calculating current time terminates automatically to current speech clause, including:When acquisition first is default
First preset duration is deducted described this voice signal pause institute by duration long and that this voice signal pause is lasting
Lasting duration obtains the remaining time that the current time terminates automatically to current speech clause;
The remaining time that the calculating current time terminates automatically to this voice typing, including:When acquisition second is default
Second preset duration is deducted described this voice signal pause institute by duration long and that this voice signal pause is lasting
Lasting duration obtains the remaining time that the current time terminates automatically to this voice typing;
First preset duration is the minimum interval between voice clause;Second preset duration is to detect language
The time that the end caps of sound terminate automatically to this voice typing.
Preferably, it is described that end points information is shown to user according to result of calculation, until this pause end includes:
If the remaining time that the current time terminates automatically to current speech clause and current time are to this voice
The remaining time that typing terminates automatically is both greater than zero, then show that the current time terminates automatically to current speech clause to user
Remaining time and remaining time for terminating to this voice typing automatically at current time;
If the remaining time that the current time terminates automatically to current speech clause is less than or equal to zero, and described works as
Front moment to the remaining time that this voice typing terminates automatically is more than zero, then show that voice clause terminates prompting letter to user
Breath, and show the remaining time that current time terminates automatically to this voice typing to user;
If the current time to the remaining time that this voice typing terminates automatically is less than or equal to zero, to user
Show that this voice typing terminates information automatically.
Preferably, it is described to show that end points information includes to user:
Carried to user's displaying end points using any one or more mode in digital diagram, progress bar, prompt tone this three
Show information.
On the other hand, there is provided a kind of voice input system, including:
Receiver module, for audio signal during real-time reception user speech typing;
Endpoint detection module, for carrying out end-point detection to the audio signal;
Determining module, for determining that according to the testing result of the endpoint detection module voice in the audio signal is
The no state that seizes up;
Computing module, for determining that in the determining module voice in the audio signal seizes up after state, presses
Endpoint time is calculated according to predetermined period;The endpoint time includes:The residue that current time terminates automatically to current speech clause
Time;
Display module, shows end points information, Zhi Daoben to user for the result of calculation according to the computing module
Secondary pause terminates.
Preferably, the endpoint time also includes:The remaining time that current time terminates automatically to this voice typing.
Preferably, the computing module includes:
First computing unit, for determining that in the determining module voice in the audio signal seizes up state
Afterwards, according to the remaining time that predetermined period calculating current time terminates automatically to current speech clause, including:Obtain first to preset
First preset duration is deducted described this voice signal and is paused by the lasting duration of duration and this voice signal pause
Lasting duration obtain the remaining time that the current time terminates automatically to current speech clause, first preset duration
For the minimum interval between voice clause;
Second computing unit, for determining that in the determining module voice in the audio signal seizes up state
Afterwards, according to the remaining time that predetermined period calculating current time terminates automatically to this voice typing, including:Obtain second to preset
Second preset duration is deducted described this voice signal and is paused by the lasting duration of duration and this voice signal pause
Lasting duration obtain the remaining time that the current time terminates automatically to this voice typing, second preset duration
To detect the time that the end caps of voice terminate automatically to this voice typing.
Preferably, the display module, it is surplus specifically for what is terminated at the current time to current speech clause automatically
During remaining time both greater than zero that remaining time and current time terminate automatically to this voice typing, show to user described current
Remaining time that moment terminates automatically to current speech clause and current time to this voice typing terminate automatically it is remaining when
Between;Automatically the remaining time terminated at the current time to current speech clause is less than or equal to zero, and the current time
Automatically, when the remaining time terminated to this voice typing is more than zero, show that voice clause terminates information to user, and to
The remaining time that user's displaying current time terminates automatically to this voice typing;At the current time to this voice
When the remaining time that typing terminates automatically is less than or equal to zero, show that this voice typing terminates information automatically to user.
Preferably, the display module, specifically for adopting digital diagram, progress bar, arbitrary in prompt tone this three
Plant or various ways show end points information to user.
By end-point detection, voice typing method and system provided in an embodiment of the present invention, determine whether voice signal is in
Standstill state, when voice signal seizes up state, by showing end points information to user, allows users to know to work as
The remaining time that the front moment terminates automatically to current speech clause, so as to be adjusted voice input speed, and select closing
The suitable moment is just paused, and can effectively be lifted voice typing quality, and then be improved the accuracy rate of speech recognition.
Description of the drawings
In order to be illustrated more clearly that the embodiment of the present invention or technical scheme of the prior art, below will be to institute in embodiment
The accompanying drawing that needs are used is briefly described, it should be apparent that, drawings in the following description are only described in the present invention
A little embodiments, for those of ordinary skill in the art, can be with according to these other accompanying drawings of accompanying drawings acquisition.
Fig. 1 is the flow chart of voice input method of the prior art;
Fig. 2 is the flow chart of voice input method provided in an embodiment of the present invention;
Fig. 3 is a kind of structural representation of voice input system provided in an embodiment of the present invention;
Fig. 4 is another kind of structural representation of voice input system provided in an embodiment of the present invention.
Specific embodiment
In order to be illustrated more clearly that the embodiment of the present application or technical scheme of the prior art, below will be to institute in embodiment
The accompanying drawing that needs are used is briefly described, it should be apparent that, drawings in the following description are only described in the present invention
A little embodiments, for those of ordinary skill in the art, can be with according to these other accompanying drawings of accompanying drawings acquisition.
The embodiment of the present invention provides a kind of voice typing method and system, by showing end points information to user, makes
User can be adjusted to voice input speed, and rationally control voice pause moment and pause duration, so as to have
Effect improves voice typing quality, and then is improved the accuracy of speech recognition.
As shown in Fig. 2 for a kind of flow chart of voice input method provided in an embodiment of the present invention, comprising the following steps:
Step 201:Audio signal during real-time reception user speech typing.
Step 202:Carry out end-point detection to audio signal, and determine voice in audio signal whether according to testing result
Seize up state.
As the voice signal in audio signal presents short-term stationarity feature, can be by doing framing to audio signal
Process, by whole audio segmentation into length-specific subsegment, so as to ensure the spectral continuity of subsegment audio frequency.At each energy
The limited length of the audio signal of reason, in addition it is also necessary to which windowing process is done to audio signal, so that audio signal handled every time
The signal being limited in window.Can specifically adopt plus the windowing process such as Hamming window or Hanning window.Preferably, every frame length of subsegment audio frequency
Spend for 25ms, frame is moved as 10ms.For the audio frequency of one section of length-specific, after framing and windowing process, can obtain multiple
Speech frame.Wherein, speech frame is the minimum unit of voice and non-voice judgement in audio signal.
End-point detection is essentially by the characteristic information in each resulting speech frame, for example, time domain energy, frequency domain
Energy or zero-crossing rate etc. are calculated, so as to make a distinction to voice and non-voice, wherein, non-voice can both be quiet, go back
It can be noise.As to the audio signal under quiet environment, voice segments energy is generally high than non-speech segment energy, voice signal
Zero-crossing rate it is generally low than the zero-crossing rate of non-speech audio, wherein, zero-crossing rate refers to the sampled audio signal value within the unit interval
By the number of times of zero point (change from positive to negative or be just changed into from negative).By the calculating to features above information, can effectively to language
Sound and non-voice make a distinction, such that it is able to judge that current audio signal is voice signal or non-speech audio.Work as judgement
When current audio signal is non-speech audio, it is believed that the voice in audio signal seizes up state, therefore, by end
Point detection can effectively recognize the beginning end points and end caps of voice in audio signal.
Step 203:If it is, endpoint time is calculated according to predetermined period, and end is shown to user according to result of calculation
Point information, until this pause terminates.
If according to end-point detection result, determining that the voice in audio signal does not occur to pause, it is also possible to according to default
Cycle does not pause information to user feedback voice signal, so that user knows that voice signal is not sent out after the information is seen
It is raw to pause.
When voice signal pause certain hour is detected, terminating standstill state, to proceed Speech Record fashionable, can make end
The point time recovers default value (for example resetting), when voice signal generation pause is detected again, calculates according still further to predetermined period
The endpoint time of renewal.Wherein it is possible to determine whether user proceeds voice typing by above-mentioned end-point detection, if detection
As a result show that voice signal terminates standstill state after pausing for a period of time, it is believed that user proceeds voice typing, otherwise,
It is considered that voice signal is continuously in standstill state.
Above-mentioned endpoint time can include:The remaining time that current time terminates automatically to current speech clause, with M ms
(millisecond) is represented.As data processing speed is fixed, when by the conversion of the data volume of each treatable audio signal can be
Between length, represented with Kms, then from voice signal occur pause the moment start to pause to terminate, calculate and feed back at interval of K ms
Once new endpoint time, while showing end points information to user.In embodiments of the present invention, for ease of description, can be with
K is referred to as into feedback interval time or predetermined period.By calculating endpoint time M it is recognised that from current time, voice letter
Number there is pause how long again, current speech clause is terminated automatically.
Above-mentioned endpoint time can also include:The remaining time that current time terminates automatically to this voice typing, with N
Ms is represented.By calculating endpoint time N it is recognised that from current time, voice signal occurs pause how long again,
This voice typing is terminated automatically.Preferably, N >=M.
In embodiments of the present invention, two time spans can be pre-set:First preset duration T1When default with second
Long T2.Wherein, the first preset duration T1Minimum interval between finger speech phone sentence, the second preset duration T2Finger detects voice
Time for terminating to this voice typing automatically of end caps, then have 0≤M≤T1, 0≤N≤T2.When voice signal stops
After, duration length of pausing is with TsRepresent, if then pausing duration length TsMore than or equal to T1,
Voice signal before and after then judgement pauses is in different voice clauses;If pausing duration length TsLess than T1,
Voice signal before and after then judgement pauses is in same voice clause;If pausing duration length TsMore than or
Equal to T2, then adjudicate this voice typing and terminate automatically.Preferably, can be by T1It is set to 300~400ms, T2It is set to 1000
~2000ms, K are set to 50ms.
After occurring to pause due to voice signal, pause duration length T fed back for the first timesNot over feedback
Interval time K, then obviously have Ts≤K.During due to first time feedback endpoint time, duration length of pausing is Ts, then
The initial feedback value of M is M0=T1-Ts, the initial feedback value of N is N0=T2-Ts, hereafter, if voice signal is still within pausing
State, then at interval of Kms, be handled as follows to M and N:Mi=Mi-1- K, Ni=Ni-1-K。
It is above-mentioned to include according to predetermined period calculating endpoint time:Calculate what current time terminated automatically to current speech clause
The remaining time N that remaining time M and current time terminate automatically to this voice typing;Wherein, current time is to current speech
The remaining time M that clause terminates automatically, can pass through the first preset duration T1When deducting this voice signal and pausing lasting
Long TsIt is calculated;The remaining time N that current time terminates automatically to this voice typing, can pass through the second preset duration T2
Deduct the lasting duration T of this voice signal pausesIt is calculated.
Wherein, end points information is shown to user according to the result of calculation of endpoint time, until this pause terminates master
To include following several situations:
(1)Mi> 0, Ni> 0, then the end points information for showing to user include MiAnd NiValue.
The remaining time M that current time terminates automatically to current speech clausei> 0, it is believed that voice signal is still located
In standstill state, and there is no the judgement that current speech clause terminates automatically;Current time is automatic to this voice typing
The remaining time N of endi> 0, it is believed that voice signal is still within standstill state, and without this voice typing of generation
Automatically the judgement for terminating.Now, by showing M to useriAnd NiValue, it is possible to use family is intuitive to see
Current speech clause is terminated automatically, and also remains how long this voice typing is terminated automatically, so that user is to language
Sound input speed, speech pause moment and pause duration are controlled.
(2)Mi≤ 0, Ni> 0, then the end points information for showing to user include that voice clause terminates information and Ni
Value.
The remaining time M that current time terminates automatically to current speech clausei≤ 0, it is believed that voice signal is still located
In standstill state, but pause duration is more than or equal to the minimum interval T between voice clause1, have occurred and that voice
Sentence terminates judgement;The remaining time N that current time terminates automatically to this voice typingi> 0, it is believed that voice signal is still
Seize up state, and does not have the judgement that this voice typing terminates automatically.At this point it is possible to show voice to user
Sentence terminates information, and the remaining time terminated to this voice typing automatically at displaying current time to user, it is possible to use
Family is intuitive to see, so that user is to voice input speed, voice
Pause moment and pause duration are controlled.
(3)Ni≤ 0, the end points information shown to user includes that this voice typing terminates information automatically.
If the remaining time N that current time terminates automatically to this voice typingi≤ 0, it is believed that voice signal is still
So seize up state, and has occurred and that the judgement that this voice typing terminates automatically.At this point it is possible to show this to user
Voice typing terminates information automatically, so that user is to voice input speed, speech pause moment and pause duration
It is controlled.It should be noted that after this voice typing terminates automatically, when can not calculate end points according still further to predetermined period
Between, can after voice typing is restarted, until detect again voice signal seize up state when, according still further to default week
Phase calculates endpoint time.
Show that to user the mode of end points information is varied, can be configured as needed, for example, can adopt
With any one or more mode in digital diagram, progress bar, prompt tone this three to user's displaying end points information, so as to
Allow users to intuitively understand recording state, when in time to voice input speed, speech pause moment and pauses last
Between be adjusted, so as to obtain high-quality recording, and then improve speech recognition accuracy.
Below by way of a specific example, the technical scheme of the embodiment of the present invention is described in detail.
For example, the audio signal of user institute typing is:Today // weather very well // I prepare to go for an outing //.Wherein, " // "
Position represents voice signal to be occurred to pause.Hypothesis " today " and " weather " intermediate hold duration be 200ms, " fine "
" I " intermediate hold duration is 500ms, " outing " subsequent user holding pause 1500ms.So, it is firm in user
After finishing " today ", start pause, now, M=T1=400ms, N=T2=1200ms.Then through the pause of 200ms,
M is reduced to 200ms, represents that also needing pause 200ms just adjudicate " today " this voice clause terminates, and N is reduced to 1000ms, table
Show.But, as user terminates to pause, start " my god
Gas ", i.e. M and N are not all reduced to 0, M and N and are restored to original default value (default value can be set to 0) until " very
It is good " finish and pause, now, M=T1=400ms, N=T2=1200ms, then intermediate hold 500ms, arrives in pause
During 400ms, M is reduced to 0, " today, weather was fine " this voice clause occurs and terminates judgement, but at the end of 500ms pauses
N=700ms, is not still reduced to 0, so, there is no this voice typing and terminate automatically judgement." I prepares to go for an outing " is finished
Generation pause 1500ms, when 400ms is paused, M is reduced to 0, " I prepares to go for an outing " this voice clause occurs and terminates to sentence
Certainly, when 1200ms is paused, N is reduced to 0, this voice typing occurs and terminates automatically judgement, even if user continues to speak
Cannot typing voice.
By end-point detection, voice input method provided in an embodiment of the present invention, determines whether voice signal seizes up shape
State, when voice signal seizes up state, by showing end points information to user, allows users to know current time
To the remaining time that current speech clause terminates automatically, so as to be adjusted to voice input speed, and select when suitable
Quarter is just paused, and can effectively be lifted voice typing quality, and then be improved the accuracy rate of speech recognition.
Correspondingly, the embodiment of the present invention additionally provides a kind of voice input system, and its structural representation was as shown in figure 3, should
Voice input system includes:
Receiver module 301, for audio signal during real-time reception user speech typing;
Endpoint detection module 302, for carrying out end-point detection to audio signal;
Whether determining module 303, the voice for being determined according to the testing result of endpoint detection module in audio signal are located
In standstill state;
Computing module 304, for after the voice that determining module is determined in audio signal seizes up state, according to default
Computation of Period endpoint time;Wherein, endpoint time includes:The remaining time that current time terminates automatically to current speech clause;
Display module 305, shows end points information to user for the result of calculation according to computing module, until this
Pause terminates.
Further, above-mentioned endpoint time can also include:The residue that current time terminates automatically to this voice typing
Time.
As shown in figure 4, above-mentioned computing module 304 can include:
First computing unit 401, for after the voice that determining module is determined in audio signal seizes up state, according to
The remaining time that predetermined period calculating current time terminates automatically to current speech clause, including:Obtain the first preset duration and
This voice signal pauses lasting duration, the first preset duration is deducted when this voice signal pauses lasting and is growed
To the remaining time that current time terminates automatically to current speech clause, the first preset duration is the minimum time between voice clause
Interval;
Second computing unit 402, for after the voice that determining module is determined in audio signal seizes up state, according to
The remaining time that predetermined period calculating current time terminates automatically to this voice typing, including:Obtain the second preset duration and
This voice signal pauses lasting duration, the second preset duration is deducted when this voice signal pauses lasting and is growed
To the remaining time that current time terminates automatically to this voice typing, the second preset duration is the end caps for detecting voice
To the time that this voice typing terminates automatically.
Wherein, display module 305, specifically for remaining time for terminating at current time to current speech clause automatically and
During remaining time both greater than zero that current time terminates automatically to this voice typing, show current time to current language to user
The remaining time that the remaining time and current time that phone sentence terminates automatically terminates automatically to this voice typing;At current time
Automatically the remaining time terminated to current speech clause is less than or equal to zero, and current time terminates automatically to this voice typing
Remaining time when being more than zero, show that voice clause terminates information to user, and show that current time arrives this to user
The remaining time that voice typing terminates automatically;Automatically the remaining time terminated at current time to this voice typing is less than or waits
When zero, show that this voice typing terminates information automatically to user.
Above-mentioned display module 305, specifically for adopting digital diagram, progress bar, in prompt tone this three any one or
Various ways show end points information to user.
By end-point detection, voice input system provided in an embodiment of the present invention, determines whether voice signal seizes up shape
State, when voice signal seizes up state, by showing end points information to user, allows users to know current time
To the remaining time that current speech clause terminates automatically, so as to be adjusted to voice input speed, and select when suitable
Quarter is just paused, and can effectively be lifted voice typing quality, and then be improved the accuracy rate of speech recognition.
Each embodiment in this specification is described by the way of progressive, identical similar portion between each embodiment
Divide mutually referring to what each embodiment was stressed is the difference with other embodiment.Especially for system reality
For applying example, as which is substantially similar to embodiment of the method, so describing fairly simple, related part is referring to embodiment of the method
Part explanation.System embodiment described above is only schematic, wherein described illustrate as separating component
Unit can be or may not be physically separate, as the part that unit shows can be or may not be
Physical location, you can local to be located at one, or can also be distributed on multiple NEs.Can be according to the actual needs
Select some or all of module therein to realize the purpose of this embodiment scheme.Those of ordinary skill in the art are not paying
In the case of creative work, you can to understand and implement.
The foregoing is only presently preferred embodiments of the present invention, not to limit the present invention, all spirit in the present invention and
Within principle, any modification, equivalent substitution and improvements made etc. should be included within the scope of the present invention.
Claims (10)
1. a kind of voice input method, it is characterised in that include:
Audio signal during real-time reception user speech typing;
End-point detection is carried out to the audio signal, and determines according to testing result whether the voice in the audio signal is in
Standstill state;
If it is, endpoint time is calculated according to predetermined period, and end points information is shown to user according to result of calculation, directly
Terminate to this pause;The endpoint time includes:The remaining time that current time terminates automatically to current speech clause;Its
In, calculating current time to the remaining time that current speech clause terminates automatically includes:Obtain the first preset duration and this language
First preset duration is deducted the lasting duration of this voice signal pause by the lasting duration of message number pause
The remaining time that the current time terminates automatically to current speech clause is obtained, first preset duration is between voice clause
Minimum interval.
2. method according to claim 1, it is characterised in that the endpoint time also includes:Current time is to this language
The remaining time that sound typing terminates automatically.
3. method according to claim 2, it is characterised in that calculate what current time terminated automatically to this voice typing
Remaining time, including:The lasting duration of the second preset duration and this voice signal pause is obtained, when described second is preset
Length deducts described this voice signal lasting duration that pauses and obtains the current time and terminate to this voice typing automatically
Remaining time;
Second preset duration is to detect the time that the end caps of voice terminate automatically to this voice typing.
4. method according to claim 3, it is characterised in that described that end points prompting letter is shown to user according to result of calculation
Breath, until this pause end includes:
If the remaining time that the current time terminates automatically to current speech clause and current time are to this voice typing
Automatically the remaining time for terminating is both greater than zero, then show the current time to remaining that current speech clause terminates automatically to user
The remaining time that remaining time and current time terminate automatically to this voice typing;
If the current time to the remaining time that current speech clause terminates automatically is less than or equal to zero, and when described current
Remaining time that this voice typing terminates automatically is carved into more than zero, then shows that voice clause terminates information to user, and
To the remaining time that user's displaying current time terminates automatically to this voice typing;
If the current time to the remaining time that this voice typing terminates automatically is less than or equal to zero, show to user
This voice typing terminates information automatically.
5. the method according to any one of Claims 1-4, it is characterised in that described to show end points information to user
Including:
Show that end points prompting believe to user using any one or more mode in digital diagram, progress bar, prompt tone this three
Breath.
6. a kind of voice input system, it is characterised in that include:
Receiver module, for audio signal during real-time reception user speech typing;
Endpoint detection module, for carrying out end-point detection to the audio signal;
Whether determining module, the voice for being determined according to the testing result of the endpoint detection module in the audio signal are located
In standstill state;
Computing module, for determining that in the determining module voice in the audio signal seizes up after state, according to pre-
If computation of Period endpoint time;The endpoint time includes:The remaining time that current time terminates automatically to current speech clause;
Display module, shows end points information to user for the result of calculation according to the computing module, until this stops
Pause and terminate;
The computing module includes:First computing unit, for the voice in the audio signal is determined in the determining module
After the state that seizes up, remaining time for terminating to current speech clause automatically at current time is calculated according to predetermined period, including:
Obtain the first preset duration and this voice signal pause lasting duration, by first preset duration deduct it is described this
Pause lasting duration of voice signal obtains the remaining time that the current time terminates automatically to current speech clause, described
First preset duration is the minimum interval between voice clause.
7. system according to claim 6, it is characterised in that the endpoint time also includes:Current time is to this language
The remaining time that sound typing terminates automatically.
8. system according to claim 7, it is characterised in that the computing module also includes:
Second computing unit, for determining that in the determining module voice in the audio signal seizes up after state, presses
Remaining time for terminating to this voice typing automatically at current time is calculated according to predetermined period, including:Obtain the second preset duration
Second preset duration is deducted described this voice signal pause and is held by the duration lasting with this voice signal pause
Continuous duration obtains the remaining time that the current time terminates automatically to this voice typing, and second preset duration is inspection
Measure the time that the end caps of voice terminate automatically to this voice typing.
9. system according to claim 8, it is characterised in that:
The display module, specifically for remaining time for terminating at the current time to current speech clause automatically and current
During remaining time both greater than zero that the moment terminates automatically to this voice typing, show the current time to current language to user
The remaining time that the remaining time and current time that phone sentence terminates automatically terminates automatically to this voice typing;Described current
Moment to the remaining time that current speech clause terminates automatically is less than or equal to zero, and the current time is to this voice typing
Automatically, when the remaining time for terminating is more than zero, shows that voice clause terminates information to user, and show described working as to user
The remaining time that the front moment terminates automatically to this voice typing;Automatically terminate to this voice typing at the current time
When remaining time is less than or equal to zero, show that this voice typing terminates information automatically to user.
10. the system according to any one of claim 6 to 9, it is characterised in that:
The display module, specifically for adopting digital diagram, progress bar, any one or more mode in prompt tone this three
Show end points information to user.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410265393.9A CN104078076B (en) | 2014-06-13 | 2014-06-13 | A kind of voice typing method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410265393.9A CN104078076B (en) | 2014-06-13 | 2014-06-13 | A kind of voice typing method and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104078076A CN104078076A (en) | 2014-10-01 |
CN104078076B true CN104078076B (en) | 2017-04-05 |
Family
ID=51599290
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410265393.9A Active CN104078076B (en) | 2014-06-13 | 2014-06-13 | A kind of voice typing method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104078076B (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105139868A (en) * | 2015-07-28 | 2015-12-09 | 苏州宏展信息科技有限公司 | speech frequency automatic compensation control method for recording pen |
CN110875033A (en) * | 2018-09-04 | 2020-03-10 | 蔚来汽车有限公司 | Method, apparatus, and computer storage medium for determining a voice end point |
CN109360551B (en) * | 2018-10-25 | 2021-02-05 | 珠海格力电器股份有限公司 | Voice recognition method and device |
CN109859773A (en) * | 2019-02-14 | 2019-06-07 | 北京儒博科技有限公司 | A kind of method for recording of sound, device, storage medium and electronic equipment |
CN110970054B (en) * | 2019-11-06 | 2022-06-24 | 广州视源电子科技股份有限公司 | Method and device for automatically stopping voice acquisition, terminal equipment and storage medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6873953B1 (en) * | 2000-05-22 | 2005-03-29 | Nuance Communications | Prosody based endpoint detection |
CN101308653A (en) * | 2008-07-17 | 2008-11-19 | 安徽科大讯飞信息科技股份有限公司 | End-point detecting method applied to speech identification system |
CN101588415A (en) * | 2009-06-29 | 2009-11-25 | 中国农业大学 | Voice service method and voice service system |
CN102231278A (en) * | 2011-06-10 | 2011-11-02 | 安徽科大讯飞信息科技股份有限公司 | Method and system for realizing automatic addition of punctuation marks in speech recognition |
WO2012055113A1 (en) * | 2010-10-29 | 2012-05-03 | 安徽科大讯飞信息科技股份有限公司 | Method and system for endpoint automatic detection of audio record |
CN103559907A (en) * | 2013-10-25 | 2014-02-05 | 广州华多网络科技有限公司 | Recording method, device and terminal |
-
2014
- 2014-06-13 CN CN201410265393.9A patent/CN104078076B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6873953B1 (en) * | 2000-05-22 | 2005-03-29 | Nuance Communications | Prosody based endpoint detection |
CN101308653A (en) * | 2008-07-17 | 2008-11-19 | 安徽科大讯飞信息科技股份有限公司 | End-point detecting method applied to speech identification system |
CN101588415A (en) * | 2009-06-29 | 2009-11-25 | 中国农业大学 | Voice service method and voice service system |
WO2012055113A1 (en) * | 2010-10-29 | 2012-05-03 | 安徽科大讯飞信息科技股份有限公司 | Method and system for endpoint automatic detection of audio record |
CN102231278A (en) * | 2011-06-10 | 2011-11-02 | 安徽科大讯飞信息科技股份有限公司 | Method and system for realizing automatic addition of punctuation marks in speech recognition |
CN103559907A (en) * | 2013-10-25 | 2014-02-05 | 广州华多网络科技有限公司 | Recording method, device and terminal |
Also Published As
Publication number | Publication date |
---|---|
CN104078076A (en) | 2014-10-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104078076B (en) | A kind of voice typing method and system | |
US8924216B2 (en) | System and method for synchronizing sound and manually transcribed text | |
CN105244026B (en) | A kind of method of speech processing and device | |
CN108847215B (en) | Method and device for voice synthesis based on user timbre | |
CN102568478B (en) | Video play control method and system based on voice recognition | |
JP2017078869A (en) | Speech endpointing | |
CN110310623A (en) | Sample generating method, model training method, device, medium and electronic equipment | |
EP3726524A1 (en) | Speech endpointing | |
CN108172242B (en) | Improved Bluetooth intelligent cloud sound box voice interaction endpoint detection method | |
CN109545197B (en) | Voice instruction identification method and device and intelligent terminal | |
CN105895103A (en) | Speech recognition method and device | |
GB1569450A (en) | Speech recognition system | |
CN107491286A (en) | Pronunciation inputting method, device, mobile terminal and the storage medium of mobile terminal | |
CN105139858A (en) | Information processing method and electronic equipment | |
CN110047470A (en) | A kind of sound end detecting method | |
US20220319538A1 (en) | Voice interactive wakeup electronic device and method based on microphone signal, and medium | |
CN104240718A (en) | Transcription support device, method, and computer program product | |
CN112133277B (en) | Sample generation method and device | |
CN104318921A (en) | Voice section segmentation detection method and system and spoken language detecting and evaluating method and system | |
CN104469487B (en) | A kind of detection method and device of scene switching point | |
CN108039181A (en) | The emotion information analysis method and device of a kind of voice signal | |
WO2018043138A1 (en) | Information processing device, information processing method, and program | |
US11948567B2 (en) | Electronic device and control method therefor | |
CN109994129A (en) | Speech processing system, method and apparatus | |
CN106297795B (en) | Audio recognition method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |