CN108962283A

CN108962283A - A kind of question terminates the determination method, apparatus and electronic equipment of mute time

Info

Publication number: CN108962283A
Application number: CN201810083491.9A
Authority: CN
Inventors: 高慧湍; 李宝祥
Original assignee: Beijing Orion Star Technology Co Ltd
Current assignee: Beijing Orion Star Technology Co Ltd
Priority date: 2018-01-29
Filing date: 2018-01-29
Publication date: 2018-12-07
Anticipated expiration: 2038-01-29
Also published as: CN108962283B

Abstract

The embodiment of the invention provides determination method, apparatus and electronic equipment that a kind of question terminates mute time, which comprises obtains the user voice signal of intelligent sound terminal acquisition；Determine the word speed information of user voice signal, wherein word speed information is the information of the word speed feature of identity user voice signal；According to word speed information and preset mute time setting rule, determine that question terminates mute time.Determine that question terminates mute time using which, can reasonably be asked a question according to the word speed feature-set of user terminates mute time, for the user of different word speeds, intelligent sound terminal can also be responded accurately, greatly improve response accuracy and the user experience of intelligent sound terminal.

Description

A kind of question terminates the determination method, apparatus and electronic equipment of mute time

Technical field

The present invention relates to field of artificial intelligence, more particularly to a kind of question terminate mute time determination method, Device and electronic equipment.

Background technique

In recent years, with the fast development of artificial intelligence technology, occur many artificial intelligence equipment in the market.Have Artificial intelligence equipment has embedded intelligent sound technology, and user can control artificial intelligence equipment by voice, Ke Yiyu Artificial intelligence equipment carry out interactive voice, including weather lookup, setting alarm clock, tell a story, chat, these can with user into The artificial intelligence equipment of row interactive voice is properly termed as intelligent sound terminal, for example, intelligent sound box, can carry out interactive voice Robot etc..

When intelligent sound terminal states voice interactive function in realization, it is clear that voice response speed is highly important.Intelligence When energy voice terminal acquisition user voice signal, collected user voice signal can be sent in real time and communicate with connection Server when server receives the user voice signal, can monitor the mute time of user voice signal, when mute time reaches When preset time, just determine that user voice signal terminates, that is to say, that after user, which speaks, one section of mute time occurs, judgement Terminate for the secondary user speech question, server will carry out the parsing work such as speech recognition to this section of user voice signal.Its In, which, which is properly termed as question, terminates mute time, and identity user, which is this time asked a question, to be terminated.

The question of general intelligence voice terminal terminate mute time be it is pre-set, cannot change.In this way, due to difference Word speed when user speaks differs greatly, and terminates mute time often using fixed question and will lead to the faster user of word speed to exist After practical question, need to wait the more long time, intelligent sound terminal can just respond.And the slower user of word speed is often also As soon as do not finish section words, response is grabbed by intelligent sound terminal, it is clear that the method for determination of this question end mute time It will lead to intelligent sound terminal response inaccuracy, user experience is bad.

Summary of the invention

The embodiment of the present invention be designed to provide it is a kind of question terminate mute time determination method, apparatus and electronics set It is standby, to improve response accuracy and the user experience of intelligent sound terminal.Specific technical solution is as follows:

In a first aspect, the embodiment of the invention provides a kind of determination method that question terminates mute time, the method packet It includes:

Obtain the user voice signal of intelligent sound terminal acquisition；

Determine the word speed information of the user voice signal, wherein the word speed information is to identify the user speech letter Number word speed feature information；

According to the word speed information and preset mute time setting rule, determine that question terminates mute time.

Optionally, the step of user voice signal for obtaining the acquisition of intelligent sound terminal, comprising:

The user voice signal of intelligent sound terminal acquisition is obtained in real time；

Before the step of word speed information of the determination user voice signal, comprising:

The duration for monitoring the user voice signal reaches preset duration；

It is described that rule is arranged according to the word speed information and preset mute time, determine that question terminates the step of mute time Suddenly, comprising:

According to the word speed information and preset mute time setting rule, the user voice signal pair currently obtained is determined The question answered terminates mute time.

Optionally, the word speed information is average word speed；

The step of word speed information of the determination user voice signal, comprising:

Obtain the duration of the user voice signal；

Speech recognition is carried out to the user voice signal, obtains the corresponding text quantity of the user voice signal；

According to the text quantity and the duration, the average word speed of the user voice signal is determined.

Optionally, described that rule is arranged according to the word speed information and preset mute time, it is mute to determine that question terminates The step of time, comprising:

According to the size relation of the average word speed and preset word speed threshold value, determine that question terminates mute time.

Optionally, the preset word speed threshold value includes the first default word speed threshold value and the second default word speed threshold value, wherein The first default word speed threshold value is less than the second default word speed threshold value；

The size relation according to the average word speed and preset word speed threshold value, determines that question terminates mute time Step, comprising:

When the average word speed is less than the described first default word speed threshold value, determining that question terminates mute time is first quiet The sound time；

When the average word speed is greater than the described first default word speed threshold value, and is less than the described second default word speed threshold value, Determining question to terminate mute time is the second mute time；

When the average word speed is greater than the described second default word speed threshold value, determining that question terminates mute time is that third is quiet The sound time, wherein first mute time is greater than second mute time, and second mute time is greater than the third Mute time.

Optionally, the word speed information is the Mean Time Between Replacement of word and word；

Speech recognition is carried out to the user voice signal, obtains adjacent text in the corresponding text of the user voice signal Interval time between word；

According to the interval time, the corresponding Mean Time Between Replacement of the user voice signal is calculated.

According to the size relation of the Mean Time Between Replacement and preset time threshold, determine that question terminates mute time.

Optionally, the preset time threshold includes the first preset time threshold and the second preset time threshold, wherein First preset time threshold is greater than the second preset time threshold；

The size relation according to the Mean Time Between Replacement and preset time threshold, when determining that question terminates mute Between the step of, comprising:

When the Mean Time Between Replacement is greater than first preset time threshold, determining that question terminates mute time is the Four mute times；

When the Mean Time Between Replacement be less than first preset time threshold, and be greater than second preset time threshold When, determining that question terminates mute time is the 5th mute time；

When the Mean Time Between Replacement is less than second preset time threshold, determining that question terminates mute time is the Six mute times, wherein the 4th mute time is greater than the 5th mute time, and the 5th mute time is greater than described 6th mute time.

Optionally, rule is arranged according to the word speed information and preset mute time described, it is quiet determines that question terminates After the step of sound time, the method also includes:

When the corresponding mute time of user voice signal for detecting that the intelligent sound terminal of acquisition currently acquires reaches When terminating mute time to identified question, the corresponding user instruction of user voice signal currently acquired is responded, In, the user instruction is the instruction determined according to the semanteme of the user voice signal currently acquired.

Second aspect, the embodiment of the invention provides the determining device that a kind of question terminates mute time, described device packets It includes:

Voice signal obtains module, for obtaining the user voice signal of intelligent sound terminal acquisition；

Word speed information determination module, for determining the word speed information of the user voice signal, wherein the word speed information For the information of the word speed feature of the mark user voice signal；

Mute time determining module determines hair for rule to be arranged according to the word speed information and preset mute time Ask end mute time.

Optionally, the voice signal acquisition module includes:

Real-time acquisition submodule, for obtaining the user voice signal of intelligent sound terminal acquisition in real time；

Described device further include:

Preset duration monitoring module, for monitoring institute before the word speed information of the determination user voice signal The duration for stating user voice signal reaches preset duration；

The mute time determining module includes:

Mute time determines submodule, for rule to be arranged according to the word speed information and preset mute time, determines The corresponding question of the user voice signal currently obtained terminates mute time.

Optionally, the word speed information is average word speed；

The word speed information determination module includes:

Duration acquisition submodule, for obtaining the duration of the user voice signal；

Text quantity determines submodule, for carrying out speech recognition to the user voice signal, obtains user's language The corresponding text quantity of sound signal；

Average word speed determines submodule, for determining the user speech letter according to the text quantity and the duration Number average word speed.

Optionally, the mute time determining module includes:

First determines submodule, for the size relation according to the average word speed and preset word speed threshold value, determines hair Ask end mute time.

Described first determines that submodule includes:

First determination unit, for determining question knot when the average word speed is less than the described first default word speed threshold value Beam mute time is the first mute time；

Second determination unit for being greater than the described first default word speed threshold value when the average word speed, and is less than described the When two default word speed threshold values, determining that question terminates mute time is the second mute time；

Third determination unit, for determining question knot when the average word speed is greater than the described second default word speed threshold value Beam mute time is third mute time, wherein first mute time is greater than second mute time, and described second is quiet The sound time is greater than the third mute time.

The word speed information determination module includes:

Interval time determines submodule, for carrying out speech recognition to the user voice signal, obtains user's language Interval time in the corresponding text of sound signal between adjacent text；

Mean Time Between Replacement determines submodule, for it is corresponding to calculate the user voice signal according to the interval time Mean Time Between Replacement.

Optionally, the mute time determining module includes:

Second determines submodule, for the size relation according to the Mean Time Between Replacement and preset time threshold, really Fixed question terminates mute time.

Described second determines that submodule includes:

4th determination unit, for determining hair when the Mean Time Between Replacement is greater than first preset time threshold It is the 4th mute time that asking, which terminates mute time,；

5th determination unit for being less than first preset time threshold when the Mean Time Between Replacement, and is greater than institute When stating the second preset time threshold, determining that question terminates mute time is the 5th mute time；

6th determination unit, for determining hair when the Mean Time Between Replacement is less than second preset time threshold It is the 6th mute time that asking, which terminates mute time, wherein the 4th mute time is greater than the 5th mute time, and described the Five mute times are greater than the 6th mute time.

Optionally, described device further include:

Respond module is instructed, for rule to be arranged according to the word speed information and preset mute time described, is determined After question terminates the step of mute time, when the user speech letter for detecting that the intelligent sound terminal of acquisition currently acquires Number corresponding mute time reach determined by question when terminating mute time, respond the user voice signal currently acquired Corresponding user instruction, wherein the user instruction is to be determined according to the semanteme of the user voice signal currently acquired Instruction.

The third aspect, the embodiment of the invention also provides a kind of electronic equipment, including processor, memory and communication are total Line, wherein processor, memory complete mutual communication by communication bus；

Memory, for storing computer program；

Processor when for executing the program stored on memory, realizes that above-mentioned question terminates mute time really Determine method and step.

Fourth aspect, the embodiment of the invention also provides a kind of computer readable storage mediums, which is characterized in that the meter Computer program is stored in calculation machine readable storage medium storing program for executing, the computer program realizes above-mentioned question when being executed by processor Terminate the determination method and step of mute time.

In scheme provided by the embodiment of the present invention, the user voice signal of intelligent sound terminal acquisition is obtained first, so The word speed information for determining user voice signal afterwards, is finally arranged rule according to word speed information and preset mute time, determines hair Ask end mute time, wherein word speed information is the information of the word speed feature of identity user voice signal.It is determined using which Question terminates mute time, and can reasonably be asked a question according to the word speed feature-set of user terminates mute time, for different languages The user of speed, intelligent sound terminal can also be responded accurately, greatly improve the response accuracy and user's body of intelligent sound terminal It tests.

Detailed description of the invention

In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with It obtains other drawings based on these drawings.

Fig. 1 is a kind of flow chart of the determination method of question end mute time provided by the embodiment of the present invention；

Fig. 2 is a kind of specific flow chart of step S102 in embodiment illustrated in fig. 1；

Fig. 3 is another specific flow chart of step S102 in embodiment illustrated in fig. 1；

Fig. 4 is a kind of structural schematic diagram of the determining device of question end mute time provided by the embodiment of the present invention；

Fig. 5 is the structural schematic diagram of a kind of electronic equipment provided by the embodiment of the present invention.

Specific embodiment

Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.

In order to improve response accuracy and the user experience of intelligent sound terminal, the embodiment of the invention provides a kind of questions Terminate determination method, apparatus, electronic equipment and the computer readable storage medium of mute time.

A kind of determination method for being provided for the embodiments of the invention question end mute time first below is introduced.

A kind of determination method that question terminates mute time provided by the embodiment of the present invention can be applied to and intelligent language The server of voice terminal communication connection, hereinafter referred to as server.Intelligent sound terminal can for arbitrarily can by voice control, The smart machine of interactive voice is carried out with user, for example, can be intelligent sound box, speech robot people etc., do specific limit herein It is fixed.

As shown in Figure 1, a kind of question terminates the determination method of mute time, which comprises

S101 obtains the user voice signal of intelligent sound terminal acquisition；

S102 determines the word speed information of the user voice signal, wherein the word speed information is to identify user's language The information of the word speed feature of sound signal；

Rule is arranged according to the word speed information and preset mute time in S103, determines that question terminates mute time.

As it can be seen that server can obtain the acquisition of intelligent sound terminal first in scheme provided by the embodiment of the present invention Then user voice signal determines the word speed information of user voice signal, finally according to word speed information and preset mute time Setting rule determines that question terminates mute time, wherein word speed information is the letter of the word speed feature of identity user voice signal Breath.Determine that question terminates mute time using which, can reasonably asking a question according to the word speed feature-set of user, it is quiet to terminate The sound time, for the user of different word speeds, intelligent sound terminal can also be responded accurately, greatly improve the sound of intelligent sound terminal Answer accuracy and user experience.

In above-mentioned steps S101, user speak i.e. issue user voice signal when, intelligent sound terminal will collect The user voice signal, and it is sent to server in real time, then server can obtain intelligent sound terminal use collected Family voice signal.

In one embodiment, the intelligent sound terminal user voice signal collected that above-mentioned server obtains can be with It is current time intelligent sound terminal user voice signal collected.For example, one can said at current time for user Words or the corresponding one section of voice signal of several words.So, question end mute time can determined by server at this time The corresponding question of a word or several words said as current time user terminates mute time, that is to say, that works as user After this question, when intelligent sound terminal acquires next section of user voice signal, server can determine that again this is next The corresponding question of section user voice signal terminates mute time, is formed true to each section of user voice signal progress dynamic in real time Fixed question terminates the mode of mute time.

In another embodiment, the intelligent sound terminal user voice signal collected that above-mentioned server obtains can To be intelligent sound terminal user voice signal collected in a period of time, this can be 3 days, 5 days, a star for a period of time It phase etc., is not specifically limited herein.That is, server can carry out question according to preset time terminates mute time really It is fixed, according to intelligent sound terminal within a preset time is collected all or the word speed information of a part of user voice signal, Determine that question terminates mute time.

After server obtains the user voice signal of intelligent sound terminal acquisition, the word speed of user voice signal can be determined Information, that is, execute step S102.Wherein, word speed information is the information of the word speed feature of identity user voice signal.Also It is that can indicate that the speak information of speed of user is not done herein for example, can be word speed, word and Mean Time Between Replacement of word etc. It is specific to limit.Server determines the mode of the word speed information of user voice signal, can be the Speech processings such as speech recognition The usual way in field, is not specifically limited herein and illustrates.

For example, if the word speed information of user voice signal is average word speed, above-mentioned preset time is 3 days, then Server can calculate the average word speed of all or a part of user voice signals obtained in 3 days, as word speed information, In this case, server can terminate mute time with the primary question of setting in every 3 days.

Next, in step s 103, rule can be arranged according to word speed information and preset mute time in server, Determine that question terminates mute time.For example, can be according to the word speed of user voice signal and the size relation of default word speed threshold value Deng determining question end mute time.In order to which scheme is clear and layout is clear, it is subsequent will to server according to word speed information and Preset mute time setting rule determines that the specific embodiment of question end mute time carries out citing introduction.

It should be noted that, in this document, what described " text " and " word " referred to is distinguished according to each speech habits Composition a word unit, usually user speak pause marks off unit.For example, in Chinese, " text " and So in short for " today, weather was how " " word " refers to the Chinese character divided according to Chinese habit, comprising 7 words, i.e., " the present ", " day ", " day ", " gas ", " why ", " " and " sample ".In English, described " text " and " word " can refer to one Word.Similarly, in other languages, such as in the language such as Korean, Japanese, French, described " text " and " word " can The unit for referring to the composition a word distinguished according to its respective speech habits, no longer enumerates herein.

In order to question end mute time corresponding to user voice signal be adjusted in real time, so that intelligent sound The voice signal that terminal can issue different user accurately be responded, as a kind of embodiment of the embodiment of the present invention, The step of user voice signal of above-mentioned acquisition intelligent sound terminal acquisition may include: that the real-time intelligent sound terminal that obtains is adopted The user voice signal of collection.

Server can obtain the user voice signal of intelligent sound terminal acquisition in real time, that is to say, that in intelligent sound While the user voice signal of terminal acquisition, user voice signal is sent to server, server receives user speech Signal carries out respective handling.

Correspondingly, before the word speed information of the above-mentioned determination user voice signal the step of, the above method can be with It include: to monitor the duration of the user voice signal to reach preset duration.

It in this case, is determining in real time since question terminates mute time, that is to say, that when user says one When words, the corresponding question of the word terminates mute time and has not determined, then server exists in response to user voice signal While obtaining user voice signal, whether the duration that can monitor the user voice signal reaches preset duration, if reached Preset duration, then execute determine the user voice signal word speed information the step of.

Wherein, which can say the length of time of a word according to general user to determine, herein not to pre- If duration is specifically limited, the general preset duration can guarantee that the corresponding text of user voice signal includes two words or more ?.

Correspondingly, above-mentioned be arranged rule according to the word speed information and preset mute time, it is mute to determine that question terminates The step of time may include:

When the duration of server monitoring user voice signal reaches preset duration, the user voice signal can be determined Word speed information, and then rule is arranged according to word speed information and preset mute time, determine the corresponding hair of the user voice signal Ask end mute time, that is, the corresponding question of the user voice signal currently obtained terminates mute time, it can be understood as The corresponding question of the current described a word of user terminates mute time.

For example, preset duration is 500 milliseconds, when the duration of server monitoring user voice signal reaches 500 milliseconds, just The word speed information for determining the user voice signal is arranged rule according to word speed information and preset mute time, determines the user The corresponding question of voice signal terminates mute time, it is assumed that determines that the corresponding question of the user voice signal terminates mute time and is 600 milliseconds, since server is while receiving user voice signal, the mute time of user voice signal can be monitored, then When monitoring that mute time reaches 600 milliseconds, server will judge that this user question terminates, and then carry out identification parsing Deng processing, to respond the corresponding user instruction of user voice signal.

As it can be seen that in the present embodiment, server can issue user every according to the user voice signal obtained in real time The corresponding question of one voice terminates mute time and is set dynamically, and uses the same intelligent sound terminal in different user When, also each voice of user can accurately be responded, user's body is further promoted according to the word speed feature of different user It tests.

For the case where above-mentioned word speed information is average word speed, as a kind of embodiment of the embodiment of the present invention, As shown in Fig. 2, the step of word speed information of the above-mentioned determination user voice signal, may include:

S201 obtains the duration of the user voice signal；

Server can be by the way that while receiving user voice signal, the modes such as record user voice signal duration be obtained The duration of user voice signal, can be using field of voice signal due to obtaining the mode of duration of user voice signal Therefore any mode for obtaining voice signal duration is no longer defined and illustrates herein.

If the user voice signal that server obtains is user's language that intelligent sound terminal acquires in above-mentioned preset time Sound signal, then server can obtain the total duration of all or a part of user voice signal at this time.For example, server The user voice signal of acquisition is the user voice signal in one week, then the duration for the user voice signal that server obtains is It can be the total duration of all user voice signals in this week, or a part of user voice signal in this week Total duration.

If what server obtained is the user voice signal of current time intelligent sound terminal acquisition, server is obtained The duration of the user voice signal taken is the duration of the user voice signal of current time intelligent sound terminal acquisition.

S202 carries out speech recognition to the user voice signal, obtains the corresponding text number of the user voice signal Amount；

Next, server can carry out speech recognition to the user voice signal of acquisition, and then obtain the user speech The corresponding text quantity of signal.It is understood that server is when carrying out speech recognition to the user voice signal, it can The corresponding word content of the user voice signal is obtained, also the corresponding text quantity of the available user voice signal.

For example, server to user voice signal carry out speech recognition when, obtain its corresponding word content be " broadcast Next song ", then, it is clear that server can determine that the corresponding text quantity of the user voice signal is 5.

It is understood that the corresponding user voice signal of text quantity that server obtains is to determine with step S201 Identical user voice signal when the duration of user voice signal, that is to say, that if above-mentioned user voice signal is default Interior a part of voice signal, then carrying out voice when calculating text quantity, and to a part of user voice signal Identify text quantity obtained.

S203 determines the average word speed of the user voice signal according to the text quantity and the duration.

After obtaining the corresponding text quantity of user voice signal, server can according to this article number of words and it is above-mentioned when It is long, determine the average word speed of user voice signal.It is understood that word speed is the speed that user speaks, unit can be used The text quantity expression said in time, the i.e. quotient of user voice signal corresponding text quantity and duration.

For example, the corresponding text quantity of user voice signal is 6, a length of 3 seconds when user voice signal is corresponding, then should The average word speed of user voice signal is 6/3=2 per second, that is to say, that the average word speed that the user speaks is each second 2 A word.

As it can be seen that in the present embodiment, the duration of the available user voice signal of server carries out language to user voice signal Sound identification, obtains the corresponding text quantity of user voice signal then according to text quantity and duration and determines user voice signal Average word speed.It can rapidly and accurately determine the word speed information of user voice signal, it is mute to improve subsequent determining question end The speed and accuracy of time.

For the case where above-mentioned word speed information is average word speed, as a kind of embodiment of the embodiment of the present invention, It is above-mentioned that rule is arranged according to the word speed information and preset mute time, determine the step of question terminates mute time, it can be with Include:

In the present embodiment, server can according to the average word speed that is calculated in above-mentioned embodiment illustrated in fig. 2 with The size relation of preset word speed threshold value determines that question terminates mute time.Wherein, preset word speed threshold value can be according to statistics The factors such as the average word speed that common people speak determine, for example, can for 3 it is per second, 4 it is per second, 5 it is per second etc., do not do herein It is specific to limit.Preset word speed threshold value can be one, or multiple, this is all reasonably, also not do specific limit herein It is fixed.

In this case, as a kind of embodiment of the embodiment of the present invention, above-mentioned preset word speed threshold value be can wrap Include the first default word speed threshold value and the second default word speed threshold value, wherein the first default word speed threshold value is less than the second default language Fast threshold value.

It in one embodiment, can be using the average word speed for the slower people that generally speaks as the first default word speed threshold Value, using the average word speed for the faster people that generally speaks as the second default word speed threshold value.

Correspondingly, the above-mentioned size relation according to the average word speed and preset word speed threshold value, it is quiet to determine that question terminates The step of sound time, may include:

When the average word speed is less than the described first default word speed threshold value, determining that question terminates mute time is first quiet The sound time；When the average word speed is greater than the described first default word speed threshold value, and is less than the described second default word speed threshold value, really It is the second mute time that fixed question, which terminates mute time,；When the average word speed is greater than the described second default word speed threshold value, really It is third mute time that fixed question, which terminates mute time,.Wherein, when the first mute time is mute greater than the second mute time, second Between be greater than third mute time.

Server is in the size relation according to above-mentioned average word speed and preset word speed threshold value, when determining that question terminates mute Between when, average word speed can be compared with the first default word speed threshold value and the second default word speed threshold value, if average word speed Less than the first default word speed threshold value, illustrate that the average word speed is slower, that is to say, the word speed that bright user speaks is slower, then servicing It is the first mute time that device, which can then determine that question terminates mute time,.It is understood that first mute time should be longer, Response can be grabbed when responding user instruction to avoid intelligent sound terminal.General first mute time can be 700 milliseconds, It can guarantee that intelligent sound terminal will not grab response when responding user instruction in this way, and can guarantee that response speed will not mistake Slowly.

If average word speed is greater than the first default word speed threshold value, and less than the second default word speed threshold value, illustrate the average language Speed is relatively mild, be not quickly nor very slow, that is to say, the word speed that bright user speaks is moderate, be not quickly nor very slow, It is the second mute time that so server, which can then determine that question terminates mute time,.It is understood that this second it is mute when Between it is unsuitable too long, also unsuitable too short, general second mute time can be 500 milliseconds, can guarantee intelligent sound end in this way End will not grab response when responding user instruction, and can guarantee that response speed will not be excessively slow.

If average word speed is greater than the second default word speed threshold value, illustrate that the average word speed is very fast, that is to say, bright user speaks Word speed it is very fast, then server can then determine question terminate mute time be third mute time.It is understood that should Third mute time should be shorter, will not grab the same of response when responding user instruction in guarantee intelligent sound terminal as far as possible When, response speed is improved as far as possible, and the waiting time is longer after avoiding user from speaking.The general third mute time can be 300 milliseconds, it can guarantee that intelligent sound terminal will not grab response when responding user instruction in this way, and sound can be improved as far as possible Answer speed.

As it can be seen that server can be according to average word speed and the first default word speed threshold value and the second default language in the present embodiment The size relation of fast threshold value, three kinds of different questions of length, which are arranged, terminates mute time, and guarantee intelligent sound terminal as far as possible exists Response will not be grabbed when responding user instruction, and response speed can be improved as far as possible, the habit of speaking of different user is adapted to, into one Step promotes user experience.

For the case where above-mentioned word speed information is the Mean Time Between Replacement of word and word, as the embodiment of the present invention one Kind embodiment, as shown in figure 3, the step of word speed information of the above-mentioned determination user voice signal, may include:

S301 carries out speech recognition to the user voice signal, obtains in the corresponding text of the user voice signal Interval time between adjacent text；

Server can carry out speech recognition to user voice signal, and then obtain while receiving user voice signal Interval time into the corresponding text of user voice signal between adjacent text.It should be noted that between the adjacent text Interval time refer to be according to each speech habits distinguishes form a word unit between interval time.

Illustratively, if the corresponding text of user voice signal is " what are you doing ", between adjacent text Interval time is between text " you " and " ", between " " and " doing ", between " doing " and " assorted ", and " assorted " and " " it Between interval time.

If the user voice signal that server obtains is user's language that intelligent sound terminal acquires in above-mentioned preset time Sound signal, then server can obtain in these user voice signals phase in the corresponding text of all or part at this time Interval time between adjacent text.For example, the user voice signal that server obtains is the user voice signal in 3 days, then Interval time in the corresponding text of user voice signal that server obtains between adjacent text be all in this 3 days or Interval time in the corresponding text of a part of user voice signal of person between adjacent text.

If server acquisition is the user voice signal of current time intelligent sound terminal acquisition, server is obtained Interval time in the corresponding text of the user voice signal taken between adjacent text is the current time intelligent sound terminal Interval time in the corresponding text of the user voice signal of acquisition between adjacent text.

It, can be with for obtaining the concrete mode of the interval time in the corresponding text of user voice signal between adjacent text It is determined by modes such as wave crest, trough corresponding times in the corresponding frequency spectrum of voice signal or waveform diagram, does not do have herein Body limits.

S302 calculates the corresponding Mean Time Between Replacement of the user voice signal according to the interval time.

After above-mentioned interval time has been determined, server can calculate the corresponding Mean Time Between Replacement of user voice signal. For example, the corresponding text of above-mentioned user voice signal is " what are you doing ", between text " you " and " ", " " and " doing " it Between, the interval time between " doing " and " assorted ", and between " assorted " and " " is respectively as follows: 400 milliseconds, 450 milliseconds, 420 milliseconds And 435 milliseconds, then, the corresponding Mean Time Between Replacement of the user voice signal is (400+450+420+435)/4= 426.25 millisecond.

As it can be seen that server can carry out speech recognition to user voice signal in the present embodiment, user voice signal is obtained Then interval time in corresponding text between adjacent text determines the equispaced of user voice signal according to interval time Time.It can rapidly and accurately determine the word speed information of user voice signal, improving subsequent determining question terminates mute time Speed and accuracy.

A kind of embodiment party for the case where above-mentioned word speed information is Mean Time Between Replacement, as the embodiment of the present invention Formula, it is above-mentioned that rule is arranged according to the word speed information and preset mute time, determine the step of question terminates mute time, it can To include:

In the present embodiment, when server can be according to the equispaced being calculated in above-mentioned embodiment illustrated in fig. 3 Between size relation with preset time threshold, determine that question terminates mute time.Wherein, preset time threshold can basis The factors such as interval time between word and word of the statistics common people when speaking determine, for example, can for 350 milliseconds, 400 milliseconds, It 450 milliseconds etc., is not specifically limited herein.Preset time threshold can be one, or multiple, this is all reasonable , it is also not specifically limited herein.

In this case, as a kind of embodiment of the embodiment of the present invention, above-mentioned preset time threshold be can wrap Include the first preset time threshold and the second preset time threshold, wherein when first preset time threshold is default greater than second Between threshold value.

In one embodiment, can using generally speak slower people speak when word and word between equispaced when Between be used as the first preset time threshold, Mean Time Between Replacement conduct when being spoken using the faster people that generally speaks between word and word Second preset time threshold.

Correspondingly, the above-mentioned size relation according to the Mean Time Between Replacement and preset time threshold, determines question knot The step of beam mute time, may include:

When the Mean Time Between Replacement is greater than first preset time threshold, determining that question terminates mute time is the Four mute times；When the Mean Time Between Replacement be less than first preset time threshold, and be greater than second preset time When threshold value, determining that question terminates mute time is the 5th mute time；It is preset when the Mean Time Between Replacement is less than described second When time threshold, determining that question terminates mute time is the 6th mute time, wherein the 4th mute time is greater than described the Five mute times, the 5th mute time are greater than the 6th mute time.

It is quiet to determine that question terminates in the size relation according to above-mentioned Mean Time Between Replacement and preset time threshold for server When the sound time, Mean Time Between Replacement can be compared with the first preset time threshold and the second preset time threshold, if Mean Time Between Replacement is greater than first preset time threshold, illustrates that the Mean Time Between Replacement is longer, that is to say, when bright user speaks Word and word interval time are longer, then it is the 4th mute time that server, which can then determine that question terminates mute time,.It can manage Solution, the 4th mute time should be longer, can grab response when responding user instruction to avoid intelligent sound terminal.Generally 4th mute time can be 700 milliseconds, can guarantee that intelligent sound terminal will not be grabbed when responding user instruction in this way Response, and can guarantee that response speed will not be excessively slow.

If Mean Time Between Replacement is greater than the second preset time threshold less than the first preset time threshold, illustrate that this is flat Equal interval time is relatively mild, is not very long nor very short, word and when word interval when user speaks that is to say, bright user speaks Between it is moderate, be not very long nor very short, then it is the 5th mute time that server, which can then determine that question terminates mute time,. It is understood that the 5th mute time is unsuitable too long, also unsuitable too short, general 5th mute time can be 500 millis Second, it can guarantee that intelligent sound terminal will not grab response when responding user instruction in this way, and can guarantee response speed not Understood slow.

If Mean Time Between Replacement less than the second preset time threshold, illustrates that the Mean Time Between Replacement is shorter, that is to say, bright Word and word interval time are shorter when user speaks, then when server can then determine that question end mute time is the 6th mute Between.It is understood that the 6th mute time should be shorter, to guarantee intelligent sound terminal when responding user instruction as far as possible While response will not be grabbed, response speed is improved as far as possible, the waiting time is longer after avoiding user from speaking.It is general this Six mute times can be 300 milliseconds, can guarantee that intelligent sound terminal will not grab response when responding user instruction in this way, Response speed can be improved as far as possible again.

As it can be seen that server can be pre- according to Mean Time Between Replacement and the first preset time threshold and second in the present embodiment If the size relation of time threshold, three kinds of questions of different sizes of setting terminate mute time, and guarantee intelligent sound as far as possible is whole End will not grab response when responding user instruction, and can improve response speed as far as possible, adapt to the habit of speaking of different user, Further promote user experience.

As a kind of embodiment of the embodiment of the present invention, above-mentioned according to the word speed information and preset mute time Setting rule, after determining the step of question terminates mute time, the above method can also include:

When the corresponding mute time of user voice signal for detecting that the intelligent sound terminal of acquisition currently acquires reaches When terminating mute time to identified question, the corresponding user instruction of user voice signal currently acquired is responded.

Wherein, which is the instruction determined according to the semanteme of the user voice signal currently acquired.For example, service Device determines that the semanteme of the user voice signal currently acquired is " today, how is weather " by speech recognition, then user instruction is It can be " playing weather forecast ".In another example server determines the language of the user voice signal currently acquired by speech recognition Justice is " playing next song ", then user instruction can be for " playing next song ".

Such as the description of above content, server is while receiving the user voice signal that intelligent sound terminal is sent, in fact When the corresponding mute time of detection user voice signal, then, in the use that the intelligent sound terminal for detecting acquisition currently acquires When voice signal corresponding mute time in family, which reaches identified question, terminates mute time, illustrating that user this time asks a question terminates, Server can carry out speech recognition to the user voice signal received, determine the language of the user voice signal currently acquired Justice and its corresponding user instruction, and then respond the user instruction.

Illustratively, if user instruction is " play weather forecast ", server can from Internet resources or Person obtains weather forecast information by other means, and the weather forecast information is sent to intelligent sound terminal, so that intelligence Voice terminal plays the weather forecast information, and user can know weather forecast.

As it can be seen that in the present embodiment, the user speech that server is currently acquired in the intelligent sound terminal for detecting acquisition When the corresponding mute time of signal, which reaches identified question, terminates mute time, the user voice signal pair currently acquired is responded The user instruction answered can terminate mute time according to determining question and judge that user's question terminates, and respond user instruction, use It experiences more preferably at family.

Corresponding to above method embodiment, terminates determining for mute time the embodiment of the invention also provides a kind of question and fill It sets.The determining device for being provided for the embodiments of the invention a kind of question end mute time below is introduced.

As shown in figure 4, a kind of question terminates the determining device of mute time, described device includes:

Voice signal obtains module 410, for obtaining the user voice signal of intelligent sound terminal acquisition；

Word speed information determination module 420, for determining the word speed information of the user voice signal；

Wherein, the word speed information is to identify the information of the word speed feature of the user voice signal.

Mute time determining module 430 is determined for rule to be arranged according to the word speed information and preset mute time Question terminates mute time.

As it can be seen that the user speech of acquisition intelligent sound terminal acquisition first is believed in scheme provided by the embodiment of the present invention Number, it then determines the word speed information of user voice signal, rule is finally arranged according to word speed information and preset mute time, really Fixed question terminates mute time, wherein word speed information is the information of the word speed feature of identity user voice signal.Using which Determine that question terminates mute time, can reasonably ask a question according to the word speed feature-set of user terminates mute time, for not With the user of word speed, intelligent sound terminal can also be responded accurately, greatly improve the response accuracy and use of intelligent sound terminal Family experience.

As a kind of embodiment of the embodiment of the present invention, above-mentioned voice signal obtains module 410 and may include:

Real-time acquisition submodule (being not shown in Fig. 4), for obtaining the user speech letter of intelligent sound terminal acquisition in real time Number；

Described device can also include:

Preset duration monitoring module (is not shown) in Fig. 4, for the word speed letter in the determination user voice signal Before breath, the duration for monitoring the user voice signal reaches preset duration；

The mute time determining module 430 may include:

Mute time determines submodule (being not shown in Fig. 4), for according to the word speed information and preset mute time Setting rule determines that the corresponding question of the user voice signal currently obtained terminates mute time.

As a kind of embodiment of the embodiment of the present invention, above-mentioned word speed information can be average word speed；

Above-mentioned word speed information determination module may include:

Duration acquisition submodule (is not shown) in Fig. 4, for obtaining the duration of the user voice signal；

Text quantity determines submodule (being not shown in Fig. 4), for carrying out speech recognition to the user voice signal, obtains To the corresponding text quantity of the user voice signal；

Average word speed determines submodule (being not shown in Fig. 4), for determining institute according to the text quantity and the duration State the average word speed of user voice signal.

As a kind of embodiment of the embodiment of the present invention, above-mentioned mute time determining module 430 may include:

First determines submodule (being not shown in Fig. 4), for according to the big of the average word speed and preset word speed threshold value Small relationship determines that question terminates mute time.

As a kind of embodiment of the embodiment of the present invention, above-mentioned preset word speed threshold value may include the first default word speed Threshold value and the second default word speed threshold value, wherein the first default word speed threshold value is less than the second default word speed threshold value；

Above-mentioned first determines that submodule may include:

First determination unit (is not shown) in Fig. 4, for being less than the described first default word speed threshold value when the average word speed When, determining that question terminates mute time is the first mute time；

Second determination unit (is not shown) in Fig. 4, for being greater than the described first default word speed threshold value when the average word speed, And when being less than the described second default word speed threshold value, determining that question terminates mute time is the second mute time；

Third determination unit (is not shown) in Fig. 4, for being greater than the described second default word speed threshold value when the average word speed When, determining that question terminates mute time is third mute time, wherein when first mute time is mute greater than described second Between, second mute time is greater than the third mute time.

As a kind of embodiment of the embodiment of the present invention, when above-mentioned word speed information can be for the equispaced of word and word Between；

The word speed information determination module may include:

Interval time determines submodule (being not shown in Fig. 4), for carrying out speech recognition to the user voice signal, obtains Interval time into the corresponding text of the user voice signal between adjacent text；

Mean Time Between Replacement determines submodule (being not shown in Fig. 4), for calculating the user according to the interval time The corresponding Mean Time Between Replacement of voice signal.

Second determines submodule (being not shown in Fig. 4), for according to the Mean Time Between Replacement and preset time threshold Size relation, determine question terminate mute time.

As a kind of embodiment of the embodiment of the present invention, above-mentioned preset time threshold may include the first preset time Threshold value and the second preset time threshold, wherein first preset time threshold is greater than the second preset time threshold；

Above-mentioned second determines that submodule may include:

4th determination unit (being not shown in Fig. 4), for being greater than first preset time when the Mean Time Between Replacement When threshold value, determining that question terminates mute time is the 4th mute time；

5th determination unit (being not shown in Fig. 4), for being less than first preset time when the Mean Time Between Replacement Threshold value, and be greater than second preset time threshold when, determine question terminate mute time be the 5th mute time；

6th determination unit (being not shown in Fig. 4), for being less than second preset time when the Mean Time Between Replacement When threshold value, determining that question terminates mute time is the 6th mute time.

Wherein, the 4th mute time is greater than the 5th mute time, and the 5th mute time is greater than described the Six mute times.

As a kind of embodiment of the embodiment of the present invention, above-mentioned apparatus can also include:

It instructs respond module (being not shown in Fig. 4), is used for described according to the word speed information and preset mute time Setting rule, after determining the step of question terminates mute time, when the intelligent sound terminal for detecting acquisition is currently adopted When the corresponding mute time of the user voice signal of collection, which reaches identified question, terminates mute time, the current acquisition is responded The corresponding user instruction of user voice signal.

Wherein, the user instruction is the instruction determined according to the semanteme of the user voice signal currently acquired.

The embodiment of the invention also provides a kind of electronic equipment, as shown in figure 5, include processor 501, communication interface 502, Memory 503 and communication bus 504, wherein processor 501, communication interface 502, memory 503 are complete by communication bus 504 At mutual communication,

Memory 503, for storing computer program；

Processor 501 when for executing the program stored on memory 503, realizes following steps:

Obtain the user voice signal of intelligent sound terminal acquisition；

As it can be seen that electronic equipment can obtain the acquisition of intelligent sound terminal first in scheme provided by the embodiment of the present invention User voice signal, then determine user voice signal word speed information, finally according to word speed information and it is preset mute when Between be arranged rule, determine question terminate mute time, wherein word speed information be identity user voice signal word speed feature letter Breath.Determine that question terminates mute time using which, can reasonably asking a question according to the word speed feature-set of user, it is quiet to terminate The sound time, for the user of different word speeds, intelligent sound terminal can also be responded accurately, greatly improve the sound of intelligent sound terminal Answer accuracy and user experience.

The communication bus that above-mentioned electronic equipment is mentioned can be Peripheral Component Interconnect standard (Peripheral Component Interconnect, PCI) bus or expanding the industrial standard structure (Extended Industry Standard Architecture, EISA) bus etc..The communication bus can be divided into address bus, data/address bus, control bus etc..For just It is only indicated with a thick line in expression, figure, it is not intended that an only bus or a type of bus.

Communication interface is for the communication between above-mentioned electronic equipment and other equipment.

Memory may include random access memory (Random Access Memory, RAM), also may include non-easy The property lost memory (Non-Volatile Memory, NVM), for example, at least a magnetic disk storage.Optionally, memory may be used also To be storage device that at least one is located remotely from aforementioned processor.

Above-mentioned processor can be general processor, including central processing unit (Central Processing Unit, CPU), network processing unit (Network Processor, NP) etc.；It can also be digital signal processor (Digital Signal Processing, DSP), it is specific integrated circuit (Application Specific Integrated Circuit, ASIC), existing It is field programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic device, discrete Door or transistor logic, discrete hardware components.

Wherein, the step of user voice signal that above-mentioned acquisition intelligent sound terminal acquires, may include:

Before the step of word speed information of the above-mentioned determination user voice signal, may include:

The duration for monitoring the user voice signal reaches preset duration；

It is above-mentioned that rule is arranged according to the word speed information and preset mute time, determine that question terminates the step of mute time Suddenly, may include:

Wherein, above-mentioned word speed information can be average word speed；

The step of word speed information of the above-mentioned determination user voice signal, may include:

Obtain the duration of the user voice signal；

Wherein, above-mentioned that rule is arranged according to the word speed information and preset mute time, when determining that question terminates mute Between the step of, may include:

Wherein, above-mentioned preset word speed threshold value may include the first default word speed threshold value and the second default word speed threshold value, In, the first default word speed threshold value is less than the second default word speed threshold value；

The above-mentioned size relation according to the average word speed and preset word speed threshold value, determines that question terminates mute time Step may include:

Above-mentioned word speed information can be the Mean Time Between Replacement of word and word；

Wherein, above-mentioned preset time threshold may include the first preset time threshold and the second preset time threshold, In, first preset time threshold is greater than the second preset time threshold；

The above-mentioned size relation according to the Mean Time Between Replacement and preset time threshold, when determining that question terminates mute Between the step of, may include:

Wherein, rule is arranged according to the word speed information and preset mute time above-mentioned, it is mute determines that question terminates After the step of time, the above method can also include:

The embodiment of the invention also provides a kind of computer readable storage medium, the computer readable storage medium memory Computer program is contained, the computer program performs the steps of when being executed by processor

Obtain the user voice signal of intelligent sound terminal acquisition；

As it can be seen that when computer program is executed by processor, obtaining intelligence first in scheme provided by the embodiment of the present invention The user voice signal of voice terminal acquisition, then determines the word speed information of user voice signal, finally according to word speed information and Preset mute time setting rule, determines that question terminates mute time, wherein word speed information is identity user voice signal The information of word speed feature.Determine that question terminates mute time using which, it can be reasonable according to the word speed feature-set of user Question terminate mute time, for the user of different word speeds, intelligent sound terminal can also be responded accurately, greatly improve intelligence The response accuracy of voice terminal and user experience.

The duration for monitoring the user voice signal reaches preset duration；

Wherein, above-mentioned word speed information can be average word speed；

Obtain the duration of the user voice signal；

It should be noted that for above-mentioned apparatus, electronic equipment and computer readable storage medium embodiment, due to It is substantially similar to embodiment of the method, so being described relatively simple, related place is referring to the part explanation of embodiment of the method It can.

Need further exist for explanation, herein, relational terms such as first and second and the like be used merely to by One entity or operation are distinguished with another entity or operation, without necessarily requiring or implying these entities or operation Between there are any actual relationship or orders.Moreover, the terms "include", "comprise" or its any other variant meaning Covering non-exclusive inclusion, so that the process, method, article or equipment for including a series of elements not only includes that A little elements, but also including other elements that are not explicitly listed, or further include for this process, method, article or The intrinsic element of equipment.In the absence of more restrictions, the element limited by sentence "including a ...", is not arranged Except there is also other identical elements in the process, method, article or apparatus that includes the element.

Each embodiment in this specification is all made of relevant mode and describes, same and similar portion between each embodiment Dividing may refer to each other, and each embodiment focuses on the differences from other embodiments.Especially for system reality For applying example, since it is substantially similar to the method embodiment, so being described relatively simple, related place is referring to embodiment of the method Part explanation.

The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the scope of the present invention.It is all Any modification, equivalent replacement, improvement and so within the spirit and principles in the present invention, are all contained in protection scope of the present invention It is interior.

Claims

1. a kind of determination method that question terminates mute time, which is characterized in that the described method includes:

Obtain the user voice signal of intelligent sound terminal acquisition；

Determine the word speed information of the user voice signal, wherein the word speed information is the mark user voice signal The information of word speed feature；

2. the method as described in claim 1, which is characterized in that the user voice signal for obtaining the acquisition of intelligent sound terminal The step of, comprising:

The duration for monitoring the user voice signal reaches preset duration；

It is described that rule is arranged according to the word speed information and preset mute time, determine the step of question terminates mute time, Include:

According to the word speed information and preset mute time setting rule, determine that the user voice signal currently obtained is corresponding Question terminates mute time.

3. the method as described in claim 1, which is characterized in that the word speed information is average word speed；

Obtain the duration of the user voice signal；

4. method as claimed in claim 3, which is characterized in that described to be set according to the word speed information and preset mute time Rule is set, determines the step of question terminates mute time, comprising:

5. method as claimed in claim 4, which is characterized in that the preset word speed threshold value includes the first default word speed threshold value And the second default word speed threshold value, wherein the first default word speed threshold value is less than the second default word speed threshold value；

The size relation according to the average word speed and preset word speed threshold value, determines that question terminates the step of mute time Suddenly, comprising:

When the average word speed is less than the described first default word speed threshold value, when determining that question end mute time is first mute Between；

When the average word speed is greater than the described first default word speed threshold value, and is less than the described second default word speed threshold value, determine It is the second mute time that question, which terminates mute time,；

When the average word speed is greater than the described second default word speed threshold value, when determining that question end mute time is that third is mute Between, wherein first mute time is greater than second mute time, and it is mute that second mute time is greater than the third Time.

6. the method as described in claim 1, which is characterized in that the word speed information is the Mean Time Between Replacement of word and word；

Speech recognition is carried out to the user voice signal, obtain in the corresponding text of the user voice signal adjacent text it Between interval time；

7. method as claimed in claim 6, which is characterized in that described to be set according to the word speed information and preset mute time Rule is set, determines the step of question terminates mute time, comprising:

8. the determining device that a kind of question terminates mute time, which is characterized in that described device includes:

Word speed information determination module, for determining the word speed information of the user voice signal, wherein the word speed information is mark Know the information of the word speed feature of the user voice signal；

Mute time determining module determines question knot for rule to be arranged according to the word speed information and preset mute time Beam mute time.

9. a kind of electronic equipment, which is characterized in that including processor, memory and communication bus, wherein processor, memory Mutual communication is completed by communication bus；

Memory, for storing computer program；

Processor when for executing the program stored on memory, realizes any method and step of claim 1-9.

10. a kind of computer readable storage medium, which is characterized in that be stored with computer in the computer readable storage medium Program realizes claim 1-9 any method and step when the computer program is executed by processor.