CN104078076B

CN104078076B - A kind of voice typing method and system

Info

Publication number: CN104078076B
Application number: CN201410265393.9A
Authority: CN
Inventors: 潘青华; 钱柄桦; 何婷婷; 王智国; 胡郁; 刘庆峰
Original assignee: iFlytek Co Ltd
Current assignee: iFlytek Co Ltd
Priority date: 2014-06-13
Filing date: 2014-06-13
Publication date: 2017-04-05
Anticipated expiration: 2034-06-13
Also published as: CN104078076A

Abstract

The invention discloses a kind of voice typing method and system, belong to voice typing technical field.The voice input method includes：Audio signal during real-time reception user speech typing；Carry out end-point detection to the audio signal, and determine according to testing result whether the voice in the audio signal seizes up state；If it is, endpoint time is calculated according to predetermined period, and end points information is shown to user according to result of calculation, until this pause terminates；The endpoint time includes：The remaining time that current time terminates automatically to current speech clause.The voice typing method and system, can effectively improve voice typing quality, and then improve the accuracy of speech recognition.

Description

A kind of voice typing method and system

Technical field

The present invention relates to voice typing technical field, more particularly to a kind of voice typing method and system.

Background technology

Through technology development for many years, voice typing is as a kind of important non-keyboard input method in PC, smart mobile phone Deng being widely used on portable equipment.Under normal circumstances, speech recognition system is after the voice for obtaining user's typing, right Voice signal carries out decoding and obtains text word string, then feeds back to user.And the accuracy rate and the matter of voice typing of speech recognition Amount has much relations.Under normal circumstances, standard is got in the accent of typing voice, and speed is more steady, pauses more accurate, and volume is more suitable In, then voice quality is higher, and correspondingly the accuracy rate of speech recognition is also higher.

As shown in figure 1, for the flow chart of voice input method of the prior art.

Voice input method of the prior art, generally includes following steps：

Step 101：After receiving the recording enabled instruction of user, start audio frequency letter during real-time reception user speech typing Number.

Wherein, enabled instruction of recording is usually trigger of the user to start button of recording, and manually can press Start button, proceeds by recording.

Step 102：Speech analysis is carried out to audio signal, and shows analysis result to user.

Wherein, speech analysis is carried out to audio signal, mainly include (sound being can indicate that to speech volume or signal amplitude Height) be analyzed, the height of speech volume is represented using the number of the energy bar number on indicator, so that user The height of volume can be controlled in typing voice.

Step 103：If the End of Tape for receiving user is indicated, stop voice typing, otherwise proceed voice Typing.

Wherein, End of Tape instruction is usually trigger of the user to End of Tape button, manually can press Conclusion button, stops voice typing.Whether recording can certainly be terminated to carry out automatically by preset endpoint detection module Judge.

Voice input method of the prior art, as volume relevant information, root are generally only included in result of voice analysis The height of voice typing volume can only be adjusted according to analysis result, and uncontrollable voice input speed, also not knowing should Should when paused, it is easy to cause voice typing poor quality because voice input speed is improper, so as to cannot Carry out speech recognition or recognition accuracy is relatively low.

The content of the invention

The purpose of the embodiment of the present invention is to provide a kind of voice typing method and system, can effectively improve voice typing Quality, and then improve the accuracy of speech recognition.

Technical scheme provided in an embodiment of the present invention is as follows：

On the one hand, there is provided a kind of voice input method, including：

Audio signal during real-time reception user speech typing；

Carry out end-point detection to the audio signal, and determine voice in the audio signal whether according to testing result Seize up state；

If it is, endpoint time is calculated according to predetermined period, and end points prompting letter is shown to user according to result of calculation Breath, until this pause terminates；The endpoint time includes：Current time to current speech clause terminate automatically it is remaining when Between.

Preferably, the endpoint time also includes：The remaining time that current time terminates automatically to this voice typing.

Preferably, it is described to include according to predetermined period calculating endpoint time：Calculate current time to current speech clause oneself The remaining time that the dynamic remaining time for terminating and current time terminate automatically to this voice typing；

The remaining time that the calculating current time terminates automatically to current speech clause, including：When acquisition first is default First preset duration is deducted described this voice signal pause institute by duration long and that this voice signal pause is lasting Lasting duration obtains the remaining time that the current time terminates automatically to current speech clause；

The remaining time that the calculating current time terminates automatically to this voice typing, including：When acquisition second is default Second preset duration is deducted described this voice signal pause institute by duration long and that this voice signal pause is lasting Lasting duration obtains the remaining time that the current time terminates automatically to this voice typing；

First preset duration is the minimum interval between voice clause；Second preset duration is to detect language The time that the end caps of sound terminate automatically to this voice typing.

Preferably, it is described that end points information is shown to user according to result of calculation, until this pause end includes：

If the remaining time that the current time terminates automatically to current speech clause and current time are to this voice The remaining time that typing terminates automatically is both greater than zero, then show that the current time terminates automatically to current speech clause to user Remaining time and remaining time for terminating to this voice typing automatically at current time；

If the remaining time that the current time terminates automatically to current speech clause is less than or equal to zero, and described works as Front moment to the remaining time that this voice typing terminates automatically is more than zero, then show that voice clause terminates prompting letter to user Breath, and show the remaining time that current time terminates automatically to this voice typing to user；

If the current time to the remaining time that this voice typing terminates automatically is less than or equal to zero, to user Show that this voice typing terminates information automatically.

Preferably, it is described to show that end points information includes to user：

Carried to user's displaying end points using any one or more mode in digital diagram, progress bar, prompt tone this three Show information.

On the other hand, there is provided a kind of voice input system, including：

Receiver module, for audio signal during real-time reception user speech typing；

Endpoint detection module, for carrying out end-point detection to the audio signal；

Determining module, for determining that according to the testing result of the endpoint detection module voice in the audio signal is The no state that seizes up；

Computing module, for determining that in the determining module voice in the audio signal seizes up after state, presses Endpoint time is calculated according to predetermined period；The endpoint time includes：The residue that current time terminates automatically to current speech clause Time；

Display module, shows end points information, Zhi Daoben to user for the result of calculation according to the computing module Secondary pause terminates.

Preferably, the computing module includes：

First computing unit, for determining that in the determining module voice in the audio signal seizes up state Afterwards, according to the remaining time that predetermined period calculating current time terminates automatically to current speech clause, including：Obtain first to preset First preset duration is deducted described this voice signal and is paused by the lasting duration of duration and this voice signal pause Lasting duration obtain the remaining time that the current time terminates automatically to current speech clause, first preset duration For the minimum interval between voice clause；

Second computing unit, for determining that in the determining module voice in the audio signal seizes up state Afterwards, according to the remaining time that predetermined period calculating current time terminates automatically to this voice typing, including：Obtain second to preset Second preset duration is deducted described this voice signal and is paused by the lasting duration of duration and this voice signal pause Lasting duration obtain the remaining time that the current time terminates automatically to this voice typing, second preset duration To detect the time that the end caps of voice terminate automatically to this voice typing.

Preferably, the display module, it is surplus specifically for what is terminated at the current time to current speech clause automatically During remaining time both greater than zero that remaining time and current time terminate automatically to this voice typing, show to user described current Remaining time that moment terminates automatically to current speech clause and current time to this voice typing terminate automatically it is remaining when Between；Automatically the remaining time terminated at the current time to current speech clause is less than or equal to zero, and the current time Automatically, when the remaining time terminated to this voice typing is more than zero, show that voice clause terminates information to user, and to The remaining time that user's displaying current time terminates automatically to this voice typing；At the current time to this voice When the remaining time that typing terminates automatically is less than or equal to zero, show that this voice typing terminates information automatically to user.

Preferably, the display module, specifically for adopting digital diagram, progress bar, arbitrary in prompt tone this three Plant or various ways show end points information to user.

By end-point detection, voice typing method and system provided in an embodiment of the present invention, determine whether voice signal is in Standstill state, when voice signal seizes up state, by showing end points information to user, allows users to know to work as The remaining time that the front moment terminates automatically to current speech clause, so as to be adjusted voice input speed, and select closing The suitable moment is just paused, and can effectively be lifted voice typing quality, and then be improved the accuracy rate of speech recognition.

Description of the drawings

In order to be illustrated more clearly that the embodiment of the present invention or technical scheme of the prior art, below will be to institute in embodiment The accompanying drawing that needs are used is briefly described, it should be apparent that, drawings in the following description are only described in the present invention A little embodiments, for those of ordinary skill in the art, can be with according to these other accompanying drawings of accompanying drawings acquisition.

Fig. 1 is the flow chart of voice input method of the prior art；

Fig. 2 is the flow chart of voice input method provided in an embodiment of the present invention；

Fig. 3 is a kind of structural representation of voice input system provided in an embodiment of the present invention；

Fig. 4 is another kind of structural representation of voice input system provided in an embodiment of the present invention.

Specific embodiment

In order to be illustrated more clearly that the embodiment of the present application or technical scheme of the prior art, below will be to institute in embodiment The accompanying drawing that needs are used is briefly described, it should be apparent that, drawings in the following description are only described in the present invention A little embodiments, for those of ordinary skill in the art, can be with according to these other accompanying drawings of accompanying drawings acquisition.

The embodiment of the present invention provides a kind of voice typing method and system, by showing end points information to user, makes User can be adjusted to voice input speed, and rationally control voice pause moment and pause duration, so as to have Effect improves voice typing quality, and then is improved the accuracy of speech recognition.

As shown in Fig. 2 for a kind of flow chart of voice input method provided in an embodiment of the present invention, comprising the following steps：

Step 201：Audio signal during real-time reception user speech typing.

Step 202：Carry out end-point detection to audio signal, and determine voice in audio signal whether according to testing result Seize up state.

As the voice signal in audio signal presents short-term stationarity feature, can be by doing framing to audio signal Process, by whole audio segmentation into length-specific subsegment, so as to ensure the spectral continuity of subsegment audio frequency.At each energy The limited length of the audio signal of reason, in addition it is also necessary to which windowing process is done to audio signal, so that audio signal handled every time The signal being limited in window.Can specifically adopt plus the windowing process such as Hamming window or Hanning window.Preferably, every frame length of subsegment audio frequency Spend for 25ms, frame is moved as 10ms.For the audio frequency of one section of length-specific, after framing and windowing process, can obtain multiple Speech frame.Wherein, speech frame is the minimum unit of voice and non-voice judgement in audio signal.

End-point detection is essentially by the characteristic information in each resulting speech frame, for example, time domain energy, frequency domain Energy or zero-crossing rate etc. are calculated, so as to make a distinction to voice and non-voice, wherein, non-voice can both be quiet, go back It can be noise.As to the audio signal under quiet environment, voice segments energy is generally high than non-speech segment energy, voice signal Zero-crossing rate it is generally low than the zero-crossing rate of non-speech audio, wherein, zero-crossing rate refers to the sampled audio signal value within the unit interval By the number of times of zero point (change from positive to negative or be just changed into from negative).By the calculating to features above information, can effectively to language Sound and non-voice make a distinction, such that it is able to judge that current audio signal is voice signal or non-speech audio.Work as judgement When current audio signal is non-speech audio, it is believed that the voice in audio signal seizes up state, therefore, by end Point detection can effectively recognize the beginning end points and end caps of voice in audio signal.

Step 203：If it is, endpoint time is calculated according to predetermined period, and end is shown to user according to result of calculation Point information, until this pause terminates.

If according to end-point detection result, determining that the voice in audio signal does not occur to pause, it is also possible to according to default Cycle does not pause information to user feedback voice signal, so that user knows that voice signal is not sent out after the information is seen It is raw to pause.

When voice signal pause certain hour is detected, terminating standstill state, to proceed Speech Record fashionable, can make end The point time recovers default value (for example resetting), when voice signal generation pause is detected again, calculates according still further to predetermined period The endpoint time of renewal.Wherein it is possible to determine whether user proceeds voice typing by above-mentioned end-point detection, if detection As a result show that voice signal terminates standstill state after pausing for a period of time, it is believed that user proceeds voice typing, otherwise, It is considered that voice signal is continuously in standstill state.

Above-mentioned endpoint time can include：The remaining time that current time terminates automatically to current speech clause, with M ms (millisecond) is represented.As data processing speed is fixed, when by the conversion of the data volume of each treatable audio signal can be Between length, represented with Kms, then from voice signal occur pause the moment start to pause to terminate, calculate and feed back at interval of K ms Once new endpoint time, while showing end points information to user.In embodiments of the present invention, for ease of description, can be with K is referred to as into feedback interval time or predetermined period.By calculating endpoint time M it is recognised that from current time, voice letter Number there is pause how long again, current speech clause is terminated automatically.

Above-mentioned endpoint time can also include：The remaining time that current time terminates automatically to this voice typing, with N Ms is represented.By calculating endpoint time N it is recognised that from current time, voice signal occurs pause how long again, This voice typing is terminated automatically.Preferably, N >=M.

In embodiments of the present invention, two time spans can be pre-set：First preset duration T₁When default with second Long T₂.Wherein, the first preset duration T₁Minimum interval between finger speech phone sentence, the second preset duration T₂Finger detects voice Time for terminating to this voice typing automatically of end caps, then have 0≤M≤T₁, 0≤N≤T₂.When voice signal stops After, duration length of pausing is with T_sRepresent, if then pausing duration length T_sMore than or equal to T₁, Voice signal before and after then judgement pauses is in different voice clauses；If pausing duration length T_sLess than T₁, Voice signal before and after then judgement pauses is in same voice clause；If pausing duration length T_sMore than or Equal to T₂, then adjudicate this voice typing and terminate automatically.Preferably, can be by T₁It is set to 300～400ms, T₂It is set to 1000 ～2000ms, K are set to 50ms.

After occurring to pause due to voice signal, pause duration length T fed back for the first time_sNot over feedback Interval time K, then obviously have T_s≤K.During due to first time feedback endpoint time, duration length of pausing is T_s, then The initial feedback value of M is M₀=T₁-T_s, the initial feedback value of N is N₀=T₂-T_s, hereafter, if voice signal is still within pausing State, then at interval of Kms, be handled as follows to M and N：M_i=M_i-1- K, N_i=N_i-1-K。

It is above-mentioned to include according to predetermined period calculating endpoint time：Calculate what current time terminated automatically to current speech clause The remaining time N that remaining time M and current time terminate automatically to this voice typing；Wherein, current time is to current speech The remaining time M that clause terminates automatically, can pass through the first preset duration T₁When deducting this voice signal and pausing lasting Long T_sIt is calculated；The remaining time N that current time terminates automatically to this voice typing, can pass through the second preset duration T₂ Deduct the lasting duration T of this voice signal pause_sIt is calculated.

Wherein, end points information is shown to user according to the result of calculation of endpoint time, until this pause terminates master To include following several situations：

(1)M_i＞ 0, N_i＞ 0, then the end points information for showing to user include M_iAnd N_iValue.

The remaining time M that current time terminates automatically to current speech clause_i＞ 0, it is believed that voice signal is still located In standstill state, and there is no the judgement that current speech clause terminates automatically；Current time is automatic to this voice typing The remaining time N of end_i＞ 0, it is believed that voice signal is still within standstill state, and without this voice typing of generation Automatically the judgement for terminating.Now, by showing M to user_iAnd N_iValue, it is possible to use family is intuitive to see Current speech clause is terminated automatically, and also remains how long this voice typing is terminated automatically, so that user is to language Sound input speed, speech pause moment and pause duration are controlled.

(2)M_i≤ 0, N_i＞ 0, then the end points information for showing to user include that voice clause terminates information and N_i Value.

The remaining time M that current time terminates automatically to current speech clause_i≤ 0, it is believed that voice signal is still located In standstill state, but pause duration is more than or equal to the minimum interval T between voice clause₁, have occurred and that voice Sentence terminates judgement；The remaining time N that current time terminates automatically to this voice typing_i＞ 0, it is believed that voice signal is still Seize up state, and does not have the judgement that this voice typing terminates automatically.At this point it is possible to show voice to user Sentence terminates information, and the remaining time terminated to this voice typing automatically at displaying current time to user, it is possible to use Family is intuitive to see, so that user is to voice input speed, voice Pause moment and pause duration are controlled.

(3)N_i≤ 0, the end points information shown to user includes that this voice typing terminates information automatically.

If the remaining time N that current time terminates automatically to this voice typing_i≤ 0, it is believed that voice signal is still So seize up state, and has occurred and that the judgement that this voice typing terminates automatically.At this point it is possible to show this to user Voice typing terminates information automatically, so that user is to voice input speed, speech pause moment and pause duration It is controlled.It should be noted that after this voice typing terminates automatically, when can not calculate end points according still further to predetermined period Between, can after voice typing is restarted, until detect again voice signal seize up state when, according still further to default week Phase calculates endpoint time.

Show that to user the mode of end points information is varied, can be configured as needed, for example, can adopt With any one or more mode in digital diagram, progress bar, prompt tone this three to user's displaying end points information, so as to Allow users to intuitively understand recording state, when in time to voice input speed, speech pause moment and pauses last Between be adjusted, so as to obtain high-quality recording, and then improve speech recognition accuracy.

Below by way of a specific example, the technical scheme of the embodiment of the present invention is described in detail.

For example, the audio signal of user institute typing is：Today // weather very well // I prepare to go for an outing //.Wherein, " // " Position represents voice signal to be occurred to pause.Hypothesis " today " and " weather " intermediate hold duration be 200ms, " fine " " I " intermediate hold duration is 500ms, " outing " subsequent user holding pause 1500ms.So, it is firm in user After finishing " today ", start pause, now, M=T₁=400ms, N=T₂=1200ms.Then through the pause of 200ms, M is reduced to 200ms, represents that also needing pause 200ms just adjudicate " today " this voice clause terminates, and N is reduced to 1000ms, table Show.But, as user terminates to pause, start " my god Gas ", i.e. M and N are not all reduced to 0, M and N and are restored to original default value (default value can be set to 0) until " very It is good " finish and pause, now, M=T₁=400ms, N=T₂=1200ms, then intermediate hold 500ms, arrives in pause During 400ms, M is reduced to 0, " today, weather was fine " this voice clause occurs and terminates judgement, but at the end of 500ms pauses N=700ms, is not still reduced to 0, so, there is no this voice typing and terminate automatically judgement." I prepares to go for an outing " is finished Generation pause 1500ms, when 400ms is paused, M is reduced to 0, " I prepares to go for an outing " this voice clause occurs and terminates to sentence Certainly, when 1200ms is paused, N is reduced to 0, this voice typing occurs and terminates automatically judgement, even if user continues to speak Cannot typing voice.

By end-point detection, voice input method provided in an embodiment of the present invention, determines whether voice signal seizes up shape State, when voice signal seizes up state, by showing end points information to user, allows users to know current time To the remaining time that current speech clause terminates automatically, so as to be adjusted to voice input speed, and select when suitable Quarter is just paused, and can effectively be lifted voice typing quality, and then be improved the accuracy rate of speech recognition.

Correspondingly, the embodiment of the present invention additionally provides a kind of voice input system, and its structural representation was as shown in figure 3, should Voice input system includes：

Receiver module 301, for audio signal during real-time reception user speech typing；

Endpoint detection module 302, for carrying out end-point detection to audio signal；

Whether determining module 303, the voice for being determined according to the testing result of endpoint detection module in audio signal are located In standstill state；

Computing module 304, for after the voice that determining module is determined in audio signal seizes up state, according to default Computation of Period endpoint time；Wherein, endpoint time includes：The remaining time that current time terminates automatically to current speech clause；

Display module 305, shows end points information to user for the result of calculation according to computing module, until this Pause terminates.

Further, above-mentioned endpoint time can also include：The residue that current time terminates automatically to this voice typing Time.

As shown in figure 4, above-mentioned computing module 304 can include：

First computing unit 401, for after the voice that determining module is determined in audio signal seizes up state, according to The remaining time that predetermined period calculating current time terminates automatically to current speech clause, including：Obtain the first preset duration and This voice signal pauses lasting duration, the first preset duration is deducted when this voice signal pauses lasting and is growed To the remaining time that current time terminates automatically to current speech clause, the first preset duration is the minimum time between voice clause Interval；

Second computing unit 402, for after the voice that determining module is determined in audio signal seizes up state, according to The remaining time that predetermined period calculating current time terminates automatically to this voice typing, including：Obtain the second preset duration and This voice signal pauses lasting duration, the second preset duration is deducted when this voice signal pauses lasting and is growed To the remaining time that current time terminates automatically to this voice typing, the second preset duration is the end caps for detecting voice To the time that this voice typing terminates automatically.

Wherein, display module 305, specifically for remaining time for terminating at current time to current speech clause automatically and During remaining time both greater than zero that current time terminates automatically to this voice typing, show current time to current language to user The remaining time that the remaining time and current time that phone sentence terminates automatically terminates automatically to this voice typing；At current time Automatically the remaining time terminated to current speech clause is less than or equal to zero, and current time terminates automatically to this voice typing Remaining time when being more than zero, show that voice clause terminates information to user, and show that current time arrives this to user The remaining time that voice typing terminates automatically；Automatically the remaining time terminated at current time to this voice typing is less than or waits When zero, show that this voice typing terminates information automatically to user.

Above-mentioned display module 305, specifically for adopting digital diagram, progress bar, in prompt tone this three any one or Various ways show end points information to user.

By end-point detection, voice input system provided in an embodiment of the present invention, determines whether voice signal seizes up shape State, when voice signal seizes up state, by showing end points information to user, allows users to know current time To the remaining time that current speech clause terminates automatically, so as to be adjusted to voice input speed, and select when suitable Quarter is just paused, and can effectively be lifted voice typing quality, and then be improved the accuracy rate of speech recognition.

Each embodiment in this specification is described by the way of progressive, identical similar portion between each embodiment Divide mutually referring to what each embodiment was stressed is the difference with other embodiment.Especially for system reality For applying example, as which is substantially similar to embodiment of the method, so describing fairly simple, related part is referring to embodiment of the method Part explanation.System embodiment described above is only schematic, wherein described illustrate as separating component Unit can be or may not be physically separate, as the part that unit shows can be or may not be Physical location, you can local to be located at one, or can also be distributed on multiple NEs.Can be according to the actual needs Select some or all of module therein to realize the purpose of this embodiment scheme.Those of ordinary skill in the art are not paying In the case of creative work, you can to understand and implement.

The foregoing is only presently preferred embodiments of the present invention, not to limit the present invention, all spirit in the present invention and Within principle, any modification, equivalent substitution and improvements made etc. should be included within the scope of the present invention.

Claims

1. a kind of voice input method, it is characterised in that include：

Audio signal during real-time reception user speech typing；

End-point detection is carried out to the audio signal, and determines according to testing result whether the voice in the audio signal is in Standstill state；

If it is, endpoint time is calculated according to predetermined period, and end points information is shown to user according to result of calculation, directly Terminate to this pause；The endpoint time includes：The remaining time that current time terminates automatically to current speech clause；Its In, calculating current time to the remaining time that current speech clause terminates automatically includes：Obtain the first preset duration and this language First preset duration is deducted the lasting duration of this voice signal pause by the lasting duration of message number pause The remaining time that the current time terminates automatically to current speech clause is obtained, first preset duration is between voice clause Minimum interval.

2. method according to claim 1, it is characterised in that the endpoint time also includes：Current time is to this language The remaining time that sound typing terminates automatically.

3. method according to claim 2, it is characterised in that calculate what current time terminated automatically to this voice typing Remaining time, including：The lasting duration of the second preset duration and this voice signal pause is obtained, when described second is preset Length deducts described this voice signal lasting duration that pauses and obtains the current time and terminate to this voice typing automatically Remaining time；

Second preset duration is to detect the time that the end caps of voice terminate automatically to this voice typing.

4. method according to claim 3, it is characterised in that described that end points prompting letter is shown to user according to result of calculation Breath, until this pause end includes：

If the remaining time that the current time terminates automatically to current speech clause and current time are to this voice typing Automatically the remaining time for terminating is both greater than zero, then show the current time to remaining that current speech clause terminates automatically to user The remaining time that remaining time and current time terminate automatically to this voice typing；

If the current time to the remaining time that current speech clause terminates automatically is less than or equal to zero, and when described current Remaining time that this voice typing terminates automatically is carved into more than zero, then shows that voice clause terminates information to user, and To the remaining time that user's displaying current time terminates automatically to this voice typing；

If the current time to the remaining time that this voice typing terminates automatically is less than or equal to zero, show to user This voice typing terminates information automatically.

5. the method according to any one of Claims 1-4, it is characterised in that described to show end points information to user Including：

Show that end points prompting believe to user using any one or more mode in digital diagram, progress bar, prompt tone this three Breath.

6. a kind of voice input system, it is characterised in that include：

Whether determining module, the voice for being determined according to the testing result of the endpoint detection module in the audio signal are located In standstill state；

Computing module, for determining that in the determining module voice in the audio signal seizes up after state, according to pre- If computation of Period endpoint time；The endpoint time includes：The remaining time that current time terminates automatically to current speech clause；

Display module, shows end points information to user for the result of calculation according to the computing module, until this stops Pause and terminate；

The computing module includes：First computing unit, for the voice in the audio signal is determined in the determining module After the state that seizes up, remaining time for terminating to current speech clause automatically at current time is calculated according to predetermined period, including： Obtain the first preset duration and this voice signal pause lasting duration, by first preset duration deduct it is described this Pause lasting duration of voice signal obtains the remaining time that the current time terminates automatically to current speech clause, described First preset duration is the minimum interval between voice clause.

7. system according to claim 6, it is characterised in that the endpoint time also includes：Current time is to this language The remaining time that sound typing terminates automatically.

8. system according to claim 7, it is characterised in that the computing module also includes：

Second computing unit, for determining that in the determining module voice in the audio signal seizes up after state, presses Remaining time for terminating to this voice typing automatically at current time is calculated according to predetermined period, including：Obtain the second preset duration Second preset duration is deducted described this voice signal pause and is held by the duration lasting with this voice signal pause Continuous duration obtains the remaining time that the current time terminates automatically to this voice typing, and second preset duration is inspection Measure the time that the end caps of voice terminate automatically to this voice typing.

9. system according to claim 8, it is characterised in that：

The display module, specifically for remaining time for terminating at the current time to current speech clause automatically and current During remaining time both greater than zero that the moment terminates automatically to this voice typing, show the current time to current language to user The remaining time that the remaining time and current time that phone sentence terminates automatically terminates automatically to this voice typing；Described current Moment to the remaining time that current speech clause terminates automatically is less than or equal to zero, and the current time is to this voice typing Automatically, when the remaining time for terminating is more than zero, shows that voice clause terminates information to user, and show described working as to user The remaining time that the front moment terminates automatically to this voice typing；Automatically terminate to this voice typing at the current time When remaining time is less than or equal to zero, show that this voice typing terminates information automatically to user.

10. the system according to any one of claim 6 to 9, it is characterised in that：

The display module, specifically for adopting digital diagram, progress bar, any one or more mode in prompt tone this three Show end points information to user.