CN108962283A - A kind of question terminates the determination method, apparatus and electronic equipment of mute time - Google Patents
A kind of question terminates the determination method, apparatus and electronic equipment of mute time Download PDFInfo
- Publication number
- CN108962283A CN108962283A CN201810083491.9A CN201810083491A CN108962283A CN 108962283 A CN108962283 A CN 108962283A CN 201810083491 A CN201810083491 A CN 201810083491A CN 108962283 A CN108962283 A CN 108962283A
- Authority
- CN
- China
- Prior art keywords
- word speed
- time
- voice signal
- user voice
- mute
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 42
- 238000004891 communication Methods 0.000 claims description 16
- 238000012544 monitoring process Methods 0.000 claims description 11
- 238000004590 computer program Methods 0.000 claims description 9
- 230000004044 response Effects 0.000 abstract description 31
- 238000013473 artificial intelligence Methods 0.000 description 7
- 238000012545 processing Methods 0.000 description 6
- 230000002452 interceptive effect Effects 0.000 description 5
- 230000005236 sound signal Effects 0.000 description 5
- 238000010586 diagram Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 230000002093 peripheral effect Effects 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 206010048669 Terminal state Diseases 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000000802 evaporation-induced self-assembly Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
- G10L2025/783—Detection of presence or absence of voice signals based on threshold decision
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Telephonic Communication Services (AREA)
- Telephone Function (AREA)
Abstract
The embodiment of the invention provides determination method, apparatus and electronic equipment that a kind of question terminates mute time, which comprises obtains the user voice signal of intelligent sound terminal acquisition;Determine the word speed information of user voice signal, wherein word speed information is the information of the word speed feature of identity user voice signal;According to word speed information and preset mute time setting rule, determine that question terminates mute time.Determine that question terminates mute time using which, can reasonably be asked a question according to the word speed feature-set of user terminates mute time, for the user of different word speeds, intelligent sound terminal can also be responded accurately, greatly improve response accuracy and the user experience of intelligent sound terminal.
Description
Technical field
The present invention relates to field of artificial intelligence, more particularly to a kind of question terminate mute time determination method,
Device and electronic equipment.
Background technique
In recent years, with the fast development of artificial intelligence technology, occur many artificial intelligence equipment in the market.Have
Artificial intelligence equipment has embedded intelligent sound technology, and user can control artificial intelligence equipment by voice, Ke Yiyu
Artificial intelligence equipment carry out interactive voice, including weather lookup, setting alarm clock, tell a story, chat, these can with user into
The artificial intelligence equipment of row interactive voice is properly termed as intelligent sound terminal, for example, intelligent sound box, can carry out interactive voice
Robot etc..
When intelligent sound terminal states voice interactive function in realization, it is clear that voice response speed is highly important.Intelligence
When energy voice terminal acquisition user voice signal, collected user voice signal can be sent in real time and communicate with connection
Server when server receives the user voice signal, can monitor the mute time of user voice signal, when mute time reaches
When preset time, just determine that user voice signal terminates, that is to say, that after user, which speaks, one section of mute time occurs, judgement
Terminate for the secondary user speech question, server will carry out the parsing work such as speech recognition to this section of user voice signal.Its
In, which, which is properly termed as question, terminates mute time, and identity user, which is this time asked a question, to be terminated.
The question of general intelligence voice terminal terminate mute time be it is pre-set, cannot change.In this way, due to difference
Word speed when user speaks differs greatly, and terminates mute time often using fixed question and will lead to the faster user of word speed to exist
After practical question, need to wait the more long time, intelligent sound terminal can just respond.And the slower user of word speed is often also
As soon as do not finish section words, response is grabbed by intelligent sound terminal, it is clear that the method for determination of this question end mute time
It will lead to intelligent sound terminal response inaccuracy, user experience is bad.
Summary of the invention
The embodiment of the present invention be designed to provide it is a kind of question terminate mute time determination method, apparatus and electronics set
It is standby, to improve response accuracy and the user experience of intelligent sound terminal.Specific technical solution is as follows:
In a first aspect, the embodiment of the invention provides a kind of determination method that question terminates mute time, the method packet
It includes:
Obtain the user voice signal of intelligent sound terminal acquisition;
Determine the word speed information of the user voice signal, wherein the word speed information is to identify the user speech letter
Number word speed feature information;
According to the word speed information and preset mute time setting rule, determine that question terminates mute time.
Optionally, the step of user voice signal for obtaining the acquisition of intelligent sound terminal, comprising:
The user voice signal of intelligent sound terminal acquisition is obtained in real time;
Before the step of word speed information of the determination user voice signal, comprising:
The duration for monitoring the user voice signal reaches preset duration;
It is described that rule is arranged according to the word speed information and preset mute time, determine that question terminates the step of mute time
Suddenly, comprising:
According to the word speed information and preset mute time setting rule, the user voice signal pair currently obtained is determined
The question answered terminates mute time.
Optionally, the word speed information is average word speed;
The step of word speed information of the determination user voice signal, comprising:
Obtain the duration of the user voice signal;
Speech recognition is carried out to the user voice signal, obtains the corresponding text quantity of the user voice signal;
According to the text quantity and the duration, the average word speed of the user voice signal is determined.
Optionally, described that rule is arranged according to the word speed information and preset mute time, it is mute to determine that question terminates
The step of time, comprising:
According to the size relation of the average word speed and preset word speed threshold value, determine that question terminates mute time.
Optionally, the preset word speed threshold value includes the first default word speed threshold value and the second default word speed threshold value, wherein
The first default word speed threshold value is less than the second default word speed threshold value;
The size relation according to the average word speed and preset word speed threshold value, determines that question terminates mute time
Step, comprising:
When the average word speed is less than the described first default word speed threshold value, determining that question terminates mute time is first quiet
The sound time;
When the average word speed is greater than the described first default word speed threshold value, and is less than the described second default word speed threshold value,
Determining question to terminate mute time is the second mute time;
When the average word speed is greater than the described second default word speed threshold value, determining that question terminates mute time is that third is quiet
The sound time, wherein first mute time is greater than second mute time, and second mute time is greater than the third
Mute time.
Optionally, the word speed information is the Mean Time Between Replacement of word and word;
The step of word speed information of the determination user voice signal, comprising:
Speech recognition is carried out to the user voice signal, obtains adjacent text in the corresponding text of the user voice signal
Interval time between word;
According to the interval time, the corresponding Mean Time Between Replacement of the user voice signal is calculated.
Optionally, described that rule is arranged according to the word speed information and preset mute time, it is mute to determine that question terminates
The step of time, comprising:
According to the size relation of the Mean Time Between Replacement and preset time threshold, determine that question terminates mute time.
Optionally, the preset time threshold includes the first preset time threshold and the second preset time threshold, wherein
First preset time threshold is greater than the second preset time threshold;
The size relation according to the Mean Time Between Replacement and preset time threshold, when determining that question terminates mute
Between the step of, comprising:
When the Mean Time Between Replacement is greater than first preset time threshold, determining that question terminates mute time is the
Four mute times;
When the Mean Time Between Replacement be less than first preset time threshold, and be greater than second preset time threshold
When, determining that question terminates mute time is the 5th mute time;
When the Mean Time Between Replacement is less than second preset time threshold, determining that question terminates mute time is the
Six mute times, wherein the 4th mute time is greater than the 5th mute time, and the 5th mute time is greater than described
6th mute time.
Optionally, rule is arranged according to the word speed information and preset mute time described, it is quiet determines that question terminates
After the step of sound time, the method also includes:
When the corresponding mute time of user voice signal for detecting that the intelligent sound terminal of acquisition currently acquires reaches
When terminating mute time to identified question, the corresponding user instruction of user voice signal currently acquired is responded,
In, the user instruction is the instruction determined according to the semanteme of the user voice signal currently acquired.
Second aspect, the embodiment of the invention provides the determining device that a kind of question terminates mute time, described device packets
It includes:
Voice signal obtains module, for obtaining the user voice signal of intelligent sound terminal acquisition;
Word speed information determination module, for determining the word speed information of the user voice signal, wherein the word speed information
For the information of the word speed feature of the mark user voice signal;
Mute time determining module determines hair for rule to be arranged according to the word speed information and preset mute time
Ask end mute time.
Optionally, the voice signal acquisition module includes:
Real-time acquisition submodule, for obtaining the user voice signal of intelligent sound terminal acquisition in real time;
Described device further include:
Preset duration monitoring module, for monitoring institute before the word speed information of the determination user voice signal
The duration for stating user voice signal reaches preset duration;
The mute time determining module includes:
Mute time determines submodule, for rule to be arranged according to the word speed information and preset mute time, determines
The corresponding question of the user voice signal currently obtained terminates mute time.
Optionally, the word speed information is average word speed;
The word speed information determination module includes:
Duration acquisition submodule, for obtaining the duration of the user voice signal;
Text quantity determines submodule, for carrying out speech recognition to the user voice signal, obtains user's language
The corresponding text quantity of sound signal;
Average word speed determines submodule, for determining the user speech letter according to the text quantity and the duration
Number average word speed.
Optionally, the mute time determining module includes:
First determines submodule, for the size relation according to the average word speed and preset word speed threshold value, determines hair
Ask end mute time.
Optionally, the preset word speed threshold value includes the first default word speed threshold value and the second default word speed threshold value, wherein
The first default word speed threshold value is less than the second default word speed threshold value;
Described first determines that submodule includes:
First determination unit, for determining question knot when the average word speed is less than the described first default word speed threshold value
Beam mute time is the first mute time;
Second determination unit for being greater than the described first default word speed threshold value when the average word speed, and is less than described the
When two default word speed threshold values, determining that question terminates mute time is the second mute time;
Third determination unit, for determining question knot when the average word speed is greater than the described second default word speed threshold value
Beam mute time is third mute time, wherein first mute time is greater than second mute time, and described second is quiet
The sound time is greater than the third mute time.
Optionally, the word speed information is the Mean Time Between Replacement of word and word;
The word speed information determination module includes:
Interval time determines submodule, for carrying out speech recognition to the user voice signal, obtains user's language
Interval time in the corresponding text of sound signal between adjacent text;
Mean Time Between Replacement determines submodule, for it is corresponding to calculate the user voice signal according to the interval time
Mean Time Between Replacement.
Optionally, the mute time determining module includes:
Second determines submodule, for the size relation according to the Mean Time Between Replacement and preset time threshold, really
Fixed question terminates mute time.
Optionally, the preset time threshold includes the first preset time threshold and the second preset time threshold, wherein
First preset time threshold is greater than the second preset time threshold;
Described second determines that submodule includes:
4th determination unit, for determining hair when the Mean Time Between Replacement is greater than first preset time threshold
It is the 4th mute time that asking, which terminates mute time,;
5th determination unit for being less than first preset time threshold when the Mean Time Between Replacement, and is greater than institute
When stating the second preset time threshold, determining that question terminates mute time is the 5th mute time;
6th determination unit, for determining hair when the Mean Time Between Replacement is less than second preset time threshold
It is the 6th mute time that asking, which terminates mute time, wherein the 4th mute time is greater than the 5th mute time, and described the
Five mute times are greater than the 6th mute time.
Optionally, described device further include:
Respond module is instructed, for rule to be arranged according to the word speed information and preset mute time described, is determined
After question terminates the step of mute time, when the user speech letter for detecting that the intelligent sound terminal of acquisition currently acquires
Number corresponding mute time reach determined by question when terminating mute time, respond the user voice signal currently acquired
Corresponding user instruction, wherein the user instruction is to be determined according to the semanteme of the user voice signal currently acquired
Instruction.
The third aspect, the embodiment of the invention also provides a kind of electronic equipment, including processor, memory and communication are total
Line, wherein processor, memory complete mutual communication by communication bus;
Memory, for storing computer program;
Processor when for executing the program stored on memory, realizes that above-mentioned question terminates mute time really
Determine method and step.
Fourth aspect, the embodiment of the invention also provides a kind of computer readable storage mediums, which is characterized in that the meter
Computer program is stored in calculation machine readable storage medium storing program for executing, the computer program realizes above-mentioned question when being executed by processor
Terminate the determination method and step of mute time.
In scheme provided by the embodiment of the present invention, the user voice signal of intelligent sound terminal acquisition is obtained first, so
The word speed information for determining user voice signal afterwards, is finally arranged rule according to word speed information and preset mute time, determines hair
Ask end mute time, wherein word speed information is the information of the word speed feature of identity user voice signal.It is determined using which
Question terminates mute time, and can reasonably be asked a question according to the word speed feature-set of user terminates mute time, for different languages
The user of speed, intelligent sound terminal can also be responded accurately, greatly improve the response accuracy and user's body of intelligent sound terminal
It tests.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below
There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this
Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with
It obtains other drawings based on these drawings.
Fig. 1 is a kind of flow chart of the determination method of question end mute time provided by the embodiment of the present invention;
Fig. 2 is a kind of specific flow chart of step S102 in embodiment illustrated in fig. 1;
Fig. 3 is another specific flow chart of step S102 in embodiment illustrated in fig. 1;
Fig. 4 is a kind of structural schematic diagram of the determining device of question end mute time provided by the embodiment of the present invention;
Fig. 5 is the structural schematic diagram of a kind of electronic equipment provided by the embodiment of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on
Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other
Embodiment shall fall within the protection scope of the present invention.
In order to improve response accuracy and the user experience of intelligent sound terminal, the embodiment of the invention provides a kind of questions
Terminate determination method, apparatus, electronic equipment and the computer readable storage medium of mute time.
A kind of determination method for being provided for the embodiments of the invention question end mute time first below is introduced.
A kind of determination method that question terminates mute time provided by the embodiment of the present invention can be applied to and intelligent language
The server of voice terminal communication connection, hereinafter referred to as server.Intelligent sound terminal can for arbitrarily can by voice control,
The smart machine of interactive voice is carried out with user, for example, can be intelligent sound box, speech robot people etc., do specific limit herein
It is fixed.
As shown in Figure 1, a kind of question terminates the determination method of mute time, which comprises
S101 obtains the user voice signal of intelligent sound terminal acquisition;
S102 determines the word speed information of the user voice signal, wherein the word speed information is to identify user's language
The information of the word speed feature of sound signal;
Rule is arranged according to the word speed information and preset mute time in S103, determines that question terminates mute time.
As it can be seen that server can obtain the acquisition of intelligent sound terminal first in scheme provided by the embodiment of the present invention
Then user voice signal determines the word speed information of user voice signal, finally according to word speed information and preset mute time
Setting rule determines that question terminates mute time, wherein word speed information is the letter of the word speed feature of identity user voice signal
Breath.Determine that question terminates mute time using which, can reasonably asking a question according to the word speed feature-set of user, it is quiet to terminate
The sound time, for the user of different word speeds, intelligent sound terminal can also be responded accurately, greatly improve the sound of intelligent sound terminal
Answer accuracy and user experience.
In above-mentioned steps S101, user speak i.e. issue user voice signal when, intelligent sound terminal will collect
The user voice signal, and it is sent to server in real time, then server can obtain intelligent sound terminal use collected
Family voice signal.
In one embodiment, the intelligent sound terminal user voice signal collected that above-mentioned server obtains can be with
It is current time intelligent sound terminal user voice signal collected.For example, one can said at current time for user
Words or the corresponding one section of voice signal of several words.So, question end mute time can determined by server at this time
The corresponding question of a word or several words said as current time user terminates mute time, that is to say, that works as user
After this question, when intelligent sound terminal acquires next section of user voice signal, server can determine that again this is next
The corresponding question of section user voice signal terminates mute time, is formed true to each section of user voice signal progress dynamic in real time
Fixed question terminates the mode of mute time.
In another embodiment, the intelligent sound terminal user voice signal collected that above-mentioned server obtains can
To be intelligent sound terminal user voice signal collected in a period of time, this can be 3 days, 5 days, a star for a period of time
It phase etc., is not specifically limited herein.That is, server can carry out question according to preset time terminates mute time really
It is fixed, according to intelligent sound terminal within a preset time is collected all or the word speed information of a part of user voice signal,
Determine that question terminates mute time.
After server obtains the user voice signal of intelligent sound terminal acquisition, the word speed of user voice signal can be determined
Information, that is, execute step S102.Wherein, word speed information is the information of the word speed feature of identity user voice signal.Also
It is that can indicate that the speak information of speed of user is not done herein for example, can be word speed, word and Mean Time Between Replacement of word etc.
It is specific to limit.Server determines the mode of the word speed information of user voice signal, can be the Speech processings such as speech recognition
The usual way in field, is not specifically limited herein and illustrates.
For example, if the word speed information of user voice signal is average word speed, above-mentioned preset time is 3 days, then
Server can calculate the average word speed of all or a part of user voice signals obtained in 3 days, as word speed information,
In this case, server can terminate mute time with the primary question of setting in every 3 days.
Next, in step s 103, rule can be arranged according to word speed information and preset mute time in server,
Determine that question terminates mute time.For example, can be according to the word speed of user voice signal and the size relation of default word speed threshold value
Deng determining question end mute time.In order to which scheme is clear and layout is clear, it is subsequent will to server according to word speed information and
Preset mute time setting rule determines that the specific embodiment of question end mute time carries out citing introduction.
It should be noted that, in this document, what described " text " and " word " referred to is distinguished according to each speech habits
Composition a word unit, usually user speak pause marks off unit.For example, in Chinese, " text " and
So in short for " today, weather was how " " word " refers to the Chinese character divided according to Chinese habit, comprising 7 words, i.e.,
" the present ", " day ", " day ", " gas ", " why ", " " and " sample ".In English, described " text " and " word " can refer to one
Word.Similarly, in other languages, such as in the language such as Korean, Japanese, French, described " text " and " word " can
The unit for referring to the composition a word distinguished according to its respective speech habits, no longer enumerates herein.
In order to question end mute time corresponding to user voice signal be adjusted in real time, so that intelligent sound
The voice signal that terminal can issue different user accurately be responded, as a kind of embodiment of the embodiment of the present invention,
The step of user voice signal of above-mentioned acquisition intelligent sound terminal acquisition may include: that the real-time intelligent sound terminal that obtains is adopted
The user voice signal of collection.
Server can obtain the user voice signal of intelligent sound terminal acquisition in real time, that is to say, that in intelligent sound
While the user voice signal of terminal acquisition, user voice signal is sent to server, server receives user speech
Signal carries out respective handling.
Correspondingly, before the word speed information of the above-mentioned determination user voice signal the step of, the above method can be with
It include: to monitor the duration of the user voice signal to reach preset duration.
It in this case, is determining in real time since question terminates mute time, that is to say, that when user says one
When words, the corresponding question of the word terminates mute time and has not determined, then server exists in response to user voice signal
While obtaining user voice signal, whether the duration that can monitor the user voice signal reaches preset duration, if reached
Preset duration, then execute determine the user voice signal word speed information the step of.
Wherein, which can say the length of time of a word according to general user to determine, herein not to pre-
If duration is specifically limited, the general preset duration can guarantee that the corresponding text of user voice signal includes two words or more
?.
Correspondingly, above-mentioned be arranged rule according to the word speed information and preset mute time, it is mute to determine that question terminates
The step of time may include:
According to the word speed information and preset mute time setting rule, the user voice signal pair currently obtained is determined
The question answered terminates mute time.
When the duration of server monitoring user voice signal reaches preset duration, the user voice signal can be determined
Word speed information, and then rule is arranged according to word speed information and preset mute time, determine the corresponding hair of the user voice signal
Ask end mute time, that is, the corresponding question of the user voice signal currently obtained terminates mute time, it can be understood as
The corresponding question of the current described a word of user terminates mute time.
For example, preset duration is 500 milliseconds, when the duration of server monitoring user voice signal reaches 500 milliseconds, just
The word speed information for determining the user voice signal is arranged rule according to word speed information and preset mute time, determines the user
The corresponding question of voice signal terminates mute time, it is assumed that determines that the corresponding question of the user voice signal terminates mute time and is
600 milliseconds, since server is while receiving user voice signal, the mute time of user voice signal can be monitored, then
When monitoring that mute time reaches 600 milliseconds, server will judge that this user question terminates, and then carry out identification parsing
Deng processing, to respond the corresponding user instruction of user voice signal.
As it can be seen that in the present embodiment, server can issue user every according to the user voice signal obtained in real time
The corresponding question of one voice terminates mute time and is set dynamically, and uses the same intelligent sound terminal in different user
When, also each voice of user can accurately be responded, user's body is further promoted according to the word speed feature of different user
It tests.
For the case where above-mentioned word speed information is average word speed, as a kind of embodiment of the embodiment of the present invention,
As shown in Fig. 2, the step of word speed information of the above-mentioned determination user voice signal, may include:
S201 obtains the duration of the user voice signal;
Server can be by the way that while receiving user voice signal, the modes such as record user voice signal duration be obtained
The duration of user voice signal, can be using field of voice signal due to obtaining the mode of duration of user voice signal
Therefore any mode for obtaining voice signal duration is no longer defined and illustrates herein.
If the user voice signal that server obtains is user's language that intelligent sound terminal acquires in above-mentioned preset time
Sound signal, then server can obtain the total duration of all or a part of user voice signal at this time.For example, server
The user voice signal of acquisition is the user voice signal in one week, then the duration for the user voice signal that server obtains is
It can be the total duration of all user voice signals in this week, or a part of user voice signal in this week
Total duration.
If what server obtained is the user voice signal of current time intelligent sound terminal acquisition, server is obtained
The duration of the user voice signal taken is the duration of the user voice signal of current time intelligent sound terminal acquisition.
S202 carries out speech recognition to the user voice signal, obtains the corresponding text number of the user voice signal
Amount;
Next, server can carry out speech recognition to the user voice signal of acquisition, and then obtain the user speech
The corresponding text quantity of signal.It is understood that server is when carrying out speech recognition to the user voice signal, it can
The corresponding word content of the user voice signal is obtained, also the corresponding text quantity of the available user voice signal.
For example, server to user voice signal carry out speech recognition when, obtain its corresponding word content be " broadcast
Next song ", then, it is clear that server can determine that the corresponding text quantity of the user voice signal is 5.
It is understood that the corresponding user voice signal of text quantity that server obtains is to determine with step S201
Identical user voice signal when the duration of user voice signal, that is to say, that if above-mentioned user voice signal is default
Interior a part of voice signal, then carrying out voice when calculating text quantity, and to a part of user voice signal
Identify text quantity obtained.
S203 determines the average word speed of the user voice signal according to the text quantity and the duration.
After obtaining the corresponding text quantity of user voice signal, server can according to this article number of words and it is above-mentioned when
It is long, determine the average word speed of user voice signal.It is understood that word speed is the speed that user speaks, unit can be used
The text quantity expression said in time, the i.e. quotient of user voice signal corresponding text quantity and duration.
For example, the corresponding text quantity of user voice signal is 6, a length of 3 seconds when user voice signal is corresponding, then should
The average word speed of user voice signal is 6/3=2 per second, that is to say, that the average word speed that the user speaks is each second 2
A word.
As it can be seen that in the present embodiment, the duration of the available user voice signal of server carries out language to user voice signal
Sound identification, obtains the corresponding text quantity of user voice signal then according to text quantity and duration and determines user voice signal
Average word speed.It can rapidly and accurately determine the word speed information of user voice signal, it is mute to improve subsequent determining question end
The speed and accuracy of time.
For the case where above-mentioned word speed information is average word speed, as a kind of embodiment of the embodiment of the present invention,
It is above-mentioned that rule is arranged according to the word speed information and preset mute time, determine the step of question terminates mute time, it can be with
Include:
According to the size relation of the average word speed and preset word speed threshold value, determine that question terminates mute time.
In the present embodiment, server can according to the average word speed that is calculated in above-mentioned embodiment illustrated in fig. 2 with
The size relation of preset word speed threshold value determines that question terminates mute time.Wherein, preset word speed threshold value can be according to statistics
The factors such as the average word speed that common people speak determine, for example, can for 3 it is per second, 4 it is per second, 5 it is per second etc., do not do herein
It is specific to limit.Preset word speed threshold value can be one, or multiple, this is all reasonably, also not do specific limit herein
It is fixed.
In this case, as a kind of embodiment of the embodiment of the present invention, above-mentioned preset word speed threshold value be can wrap
Include the first default word speed threshold value and the second default word speed threshold value, wherein the first default word speed threshold value is less than the second default language
Fast threshold value.
It in one embodiment, can be using the average word speed for the slower people that generally speaks as the first default word speed threshold
Value, using the average word speed for the faster people that generally speaks as the second default word speed threshold value.
Correspondingly, the above-mentioned size relation according to the average word speed and preset word speed threshold value, it is quiet to determine that question terminates
The step of sound time, may include:
When the average word speed is less than the described first default word speed threshold value, determining that question terminates mute time is first quiet
The sound time;When the average word speed is greater than the described first default word speed threshold value, and is less than the described second default word speed threshold value, really
It is the second mute time that fixed question, which terminates mute time,;When the average word speed is greater than the described second default word speed threshold value, really
It is third mute time that fixed question, which terminates mute time,.Wherein, when the first mute time is mute greater than the second mute time, second
Between be greater than third mute time.
Server is in the size relation according to above-mentioned average word speed and preset word speed threshold value, when determining that question terminates mute
Between when, average word speed can be compared with the first default word speed threshold value and the second default word speed threshold value, if average word speed
Less than the first default word speed threshold value, illustrate that the average word speed is slower, that is to say, the word speed that bright user speaks is slower, then servicing
It is the first mute time that device, which can then determine that question terminates mute time,.It is understood that first mute time should be longer,
Response can be grabbed when responding user instruction to avoid intelligent sound terminal.General first mute time can be 700 milliseconds,
It can guarantee that intelligent sound terminal will not grab response when responding user instruction in this way, and can guarantee that response speed will not mistake
Slowly.
If average word speed is greater than the first default word speed threshold value, and less than the second default word speed threshold value, illustrate the average language
Speed is relatively mild, be not quickly nor very slow, that is to say, the word speed that bright user speaks is moderate, be not quickly nor very slow,
It is the second mute time that so server, which can then determine that question terminates mute time,.It is understood that this second it is mute when
Between it is unsuitable too long, also unsuitable too short, general second mute time can be 500 milliseconds, can guarantee intelligent sound end in this way
End will not grab response when responding user instruction, and can guarantee that response speed will not be excessively slow.
If average word speed is greater than the second default word speed threshold value, illustrate that the average word speed is very fast, that is to say, bright user speaks
Word speed it is very fast, then server can then determine question terminate mute time be third mute time.It is understood that should
Third mute time should be shorter, will not grab the same of response when responding user instruction in guarantee intelligent sound terminal as far as possible
When, response speed is improved as far as possible, and the waiting time is longer after avoiding user from speaking.The general third mute time can be
300 milliseconds, it can guarantee that intelligent sound terminal will not grab response when responding user instruction in this way, and sound can be improved as far as possible
Answer speed.
As it can be seen that server can be according to average word speed and the first default word speed threshold value and the second default language in the present embodiment
The size relation of fast threshold value, three kinds of different questions of length, which are arranged, terminates mute time, and guarantee intelligent sound terminal as far as possible exists
Response will not be grabbed when responding user instruction, and response speed can be improved as far as possible, the habit of speaking of different user is adapted to, into one
Step promotes user experience.
For the case where above-mentioned word speed information is the Mean Time Between Replacement of word and word, as the embodiment of the present invention one
Kind embodiment, as shown in figure 3, the step of word speed information of the above-mentioned determination user voice signal, may include:
S301 carries out speech recognition to the user voice signal, obtains in the corresponding text of the user voice signal
Interval time between adjacent text;
Server can carry out speech recognition to user voice signal, and then obtain while receiving user voice signal
Interval time into the corresponding text of user voice signal between adjacent text.It should be noted that between the adjacent text
Interval time refer to be according to each speech habits distinguishes form a word unit between interval time.
Illustratively, if the corresponding text of user voice signal is " what are you doing ", between adjacent text
Interval time is between text " you " and " ", between " " and " doing ", between " doing " and " assorted ", and " assorted " and " " it
Between interval time.
If the user voice signal that server obtains is user's language that intelligent sound terminal acquires in above-mentioned preset time
Sound signal, then server can obtain in these user voice signals phase in the corresponding text of all or part at this time
Interval time between adjacent text.For example, the user voice signal that server obtains is the user voice signal in 3 days, then
Interval time in the corresponding text of user voice signal that server obtains between adjacent text be all in this 3 days or
Interval time in the corresponding text of a part of user voice signal of person between adjacent text.
If server acquisition is the user voice signal of current time intelligent sound terminal acquisition, server is obtained
Interval time in the corresponding text of the user voice signal taken between adjacent text is the current time intelligent sound terminal
Interval time in the corresponding text of the user voice signal of acquisition between adjacent text.
It, can be with for obtaining the concrete mode of the interval time in the corresponding text of user voice signal between adjacent text
It is determined by modes such as wave crest, trough corresponding times in the corresponding frequency spectrum of voice signal or waveform diagram, does not do have herein
Body limits.
S302 calculates the corresponding Mean Time Between Replacement of the user voice signal according to the interval time.
After above-mentioned interval time has been determined, server can calculate the corresponding Mean Time Between Replacement of user voice signal.
For example, the corresponding text of above-mentioned user voice signal is " what are you doing ", between text " you " and " ", " " and " doing " it
Between, the interval time between " doing " and " assorted ", and between " assorted " and " " is respectively as follows: 400 milliseconds, 450 milliseconds, 420 milliseconds
And 435 milliseconds, then, the corresponding Mean Time Between Replacement of the user voice signal is (400+450+420+435)/4=
426.25 millisecond.
As it can be seen that server can carry out speech recognition to user voice signal in the present embodiment, user voice signal is obtained
Then interval time in corresponding text between adjacent text determines the equispaced of user voice signal according to interval time
Time.It can rapidly and accurately determine the word speed information of user voice signal, improving subsequent determining question terminates mute time
Speed and accuracy.
A kind of embodiment party for the case where above-mentioned word speed information is Mean Time Between Replacement, as the embodiment of the present invention
Formula, it is above-mentioned that rule is arranged according to the word speed information and preset mute time, determine the step of question terminates mute time, it can
To include:
According to the size relation of the Mean Time Between Replacement and preset time threshold, determine that question terminates mute time.
In the present embodiment, when server can be according to the equispaced being calculated in above-mentioned embodiment illustrated in fig. 3
Between size relation with preset time threshold, determine that question terminates mute time.Wherein, preset time threshold can basis
The factors such as interval time between word and word of the statistics common people when speaking determine, for example, can for 350 milliseconds, 400 milliseconds,
It 450 milliseconds etc., is not specifically limited herein.Preset time threshold can be one, or multiple, this is all reasonable
, it is also not specifically limited herein.
In this case, as a kind of embodiment of the embodiment of the present invention, above-mentioned preset time threshold be can wrap
Include the first preset time threshold and the second preset time threshold, wherein when first preset time threshold is default greater than second
Between threshold value.
In one embodiment, can using generally speak slower people speak when word and word between equispaced when
Between be used as the first preset time threshold, Mean Time Between Replacement conduct when being spoken using the faster people that generally speaks between word and word
Second preset time threshold.
Correspondingly, the above-mentioned size relation according to the Mean Time Between Replacement and preset time threshold, determines question knot
The step of beam mute time, may include:
When the Mean Time Between Replacement is greater than first preset time threshold, determining that question terminates mute time is the
Four mute times;When the Mean Time Between Replacement be less than first preset time threshold, and be greater than second preset time
When threshold value, determining that question terminates mute time is the 5th mute time;It is preset when the Mean Time Between Replacement is less than described second
When time threshold, determining that question terminates mute time is the 6th mute time, wherein the 4th mute time is greater than described the
Five mute times, the 5th mute time are greater than the 6th mute time.
It is quiet to determine that question terminates in the size relation according to above-mentioned Mean Time Between Replacement and preset time threshold for server
When the sound time, Mean Time Between Replacement can be compared with the first preset time threshold and the second preset time threshold, if
Mean Time Between Replacement is greater than first preset time threshold, illustrates that the Mean Time Between Replacement is longer, that is to say, when bright user speaks
Word and word interval time are longer, then it is the 4th mute time that server, which can then determine that question terminates mute time,.It can manage
Solution, the 4th mute time should be longer, can grab response when responding user instruction to avoid intelligent sound terminal.Generally
4th mute time can be 700 milliseconds, can guarantee that intelligent sound terminal will not be grabbed when responding user instruction in this way
Response, and can guarantee that response speed will not be excessively slow.
If Mean Time Between Replacement is greater than the second preset time threshold less than the first preset time threshold, illustrate that this is flat
Equal interval time is relatively mild, is not very long nor very short, word and when word interval when user speaks that is to say, bright user speaks
Between it is moderate, be not very long nor very short, then it is the 5th mute time that server, which can then determine that question terminates mute time,.
It is understood that the 5th mute time is unsuitable too long, also unsuitable too short, general 5th mute time can be 500 millis
Second, it can guarantee that intelligent sound terminal will not grab response when responding user instruction in this way, and can guarantee response speed not
Understood slow.
If Mean Time Between Replacement less than the second preset time threshold, illustrates that the Mean Time Between Replacement is shorter, that is to say, bright
Word and word interval time are shorter when user speaks, then when server can then determine that question end mute time is the 6th mute
Between.It is understood that the 6th mute time should be shorter, to guarantee intelligent sound terminal when responding user instruction as far as possible
While response will not be grabbed, response speed is improved as far as possible, the waiting time is longer after avoiding user from speaking.It is general this
Six mute times can be 300 milliseconds, can guarantee that intelligent sound terminal will not grab response when responding user instruction in this way,
Response speed can be improved as far as possible again.
As it can be seen that server can be pre- according to Mean Time Between Replacement and the first preset time threshold and second in the present embodiment
If the size relation of time threshold, three kinds of questions of different sizes of setting terminate mute time, and guarantee intelligent sound as far as possible is whole
End will not grab response when responding user instruction, and can improve response speed as far as possible, adapt to the habit of speaking of different user,
Further promote user experience.
As a kind of embodiment of the embodiment of the present invention, above-mentioned according to the word speed information and preset mute time
Setting rule, after determining the step of question terminates mute time, the above method can also include:
When the corresponding mute time of user voice signal for detecting that the intelligent sound terminal of acquisition currently acquires reaches
When terminating mute time to identified question, the corresponding user instruction of user voice signal currently acquired is responded.
Wherein, which is the instruction determined according to the semanteme of the user voice signal currently acquired.For example, service
Device determines that the semanteme of the user voice signal currently acquired is " today, how is weather " by speech recognition, then user instruction is
It can be " playing weather forecast ".In another example server determines the language of the user voice signal currently acquired by speech recognition
Justice is " playing next song ", then user instruction can be for " playing next song ".
Such as the description of above content, server is while receiving the user voice signal that intelligent sound terminal is sent, in fact
When the corresponding mute time of detection user voice signal, then, in the use that the intelligent sound terminal for detecting acquisition currently acquires
When voice signal corresponding mute time in family, which reaches identified question, terminates mute time, illustrating that user this time asks a question terminates,
Server can carry out speech recognition to the user voice signal received, determine the language of the user voice signal currently acquired
Justice and its corresponding user instruction, and then respond the user instruction.
Illustratively, if user instruction is " play weather forecast ", server can from Internet resources or
Person obtains weather forecast information by other means, and the weather forecast information is sent to intelligent sound terminal, so that intelligence
Voice terminal plays the weather forecast information, and user can know weather forecast.
As it can be seen that in the present embodiment, the user speech that server is currently acquired in the intelligent sound terminal for detecting acquisition
When the corresponding mute time of signal, which reaches identified question, terminates mute time, the user voice signal pair currently acquired is responded
The user instruction answered can terminate mute time according to determining question and judge that user's question terminates, and respond user instruction, use
It experiences more preferably at family.
Corresponding to above method embodiment, terminates determining for mute time the embodiment of the invention also provides a kind of question and fill
It sets.The determining device for being provided for the embodiments of the invention a kind of question end mute time below is introduced.
As shown in figure 4, a kind of question terminates the determining device of mute time, described device includes:
Voice signal obtains module 410, for obtaining the user voice signal of intelligent sound terminal acquisition;
Word speed information determination module 420, for determining the word speed information of the user voice signal;
Wherein, the word speed information is to identify the information of the word speed feature of the user voice signal.
Mute time determining module 430 is determined for rule to be arranged according to the word speed information and preset mute time
Question terminates mute time.
As it can be seen that the user speech of acquisition intelligent sound terminal acquisition first is believed in scheme provided by the embodiment of the present invention
Number, it then determines the word speed information of user voice signal, rule is finally arranged according to word speed information and preset mute time, really
Fixed question terminates mute time, wherein word speed information is the information of the word speed feature of identity user voice signal.Using which
Determine that question terminates mute time, can reasonably ask a question according to the word speed feature-set of user terminates mute time, for not
With the user of word speed, intelligent sound terminal can also be responded accurately, greatly improve the response accuracy and use of intelligent sound terminal
Family experience.
As a kind of embodiment of the embodiment of the present invention, above-mentioned voice signal obtains module 410 and may include:
Real-time acquisition submodule (being not shown in Fig. 4), for obtaining the user speech letter of intelligent sound terminal acquisition in real time
Number;
Described device can also include:
Preset duration monitoring module (is not shown) in Fig. 4, for the word speed letter in the determination user voice signal
Before breath, the duration for monitoring the user voice signal reaches preset duration;
The mute time determining module 430 may include:
Mute time determines submodule (being not shown in Fig. 4), for according to the word speed information and preset mute time
Setting rule determines that the corresponding question of the user voice signal currently obtained terminates mute time.
As a kind of embodiment of the embodiment of the present invention, above-mentioned word speed information can be average word speed;
Above-mentioned word speed information determination module may include:
Duration acquisition submodule (is not shown) in Fig. 4, for obtaining the duration of the user voice signal;
Text quantity determines submodule (being not shown in Fig. 4), for carrying out speech recognition to the user voice signal, obtains
To the corresponding text quantity of the user voice signal;
Average word speed determines submodule (being not shown in Fig. 4), for determining institute according to the text quantity and the duration
State the average word speed of user voice signal.
As a kind of embodiment of the embodiment of the present invention, above-mentioned mute time determining module 430 may include:
First determines submodule (being not shown in Fig. 4), for according to the big of the average word speed and preset word speed threshold value
Small relationship determines that question terminates mute time.
As a kind of embodiment of the embodiment of the present invention, above-mentioned preset word speed threshold value may include the first default word speed
Threshold value and the second default word speed threshold value, wherein the first default word speed threshold value is less than the second default word speed threshold value;
Above-mentioned first determines that submodule may include:
First determination unit (is not shown) in Fig. 4, for being less than the described first default word speed threshold value when the average word speed
When, determining that question terminates mute time is the first mute time;
Second determination unit (is not shown) in Fig. 4, for being greater than the described first default word speed threshold value when the average word speed,
And when being less than the described second default word speed threshold value, determining that question terminates mute time is the second mute time;
Third determination unit (is not shown) in Fig. 4, for being greater than the described second default word speed threshold value when the average word speed
When, determining that question terminates mute time is third mute time, wherein when first mute time is mute greater than described second
Between, second mute time is greater than the third mute time.
As a kind of embodiment of the embodiment of the present invention, when above-mentioned word speed information can be for the equispaced of word and word
Between;
The word speed information determination module may include:
Interval time determines submodule (being not shown in Fig. 4), for carrying out speech recognition to the user voice signal, obtains
Interval time into the corresponding text of the user voice signal between adjacent text;
Mean Time Between Replacement determines submodule (being not shown in Fig. 4), for calculating the user according to the interval time
The corresponding Mean Time Between Replacement of voice signal.
As a kind of embodiment of the embodiment of the present invention, above-mentioned mute time determining module 430 may include:
Second determines submodule (being not shown in Fig. 4), for according to the Mean Time Between Replacement and preset time threshold
Size relation, determine question terminate mute time.
As a kind of embodiment of the embodiment of the present invention, above-mentioned preset time threshold may include the first preset time
Threshold value and the second preset time threshold, wherein first preset time threshold is greater than the second preset time threshold;
Above-mentioned second determines that submodule may include:
4th determination unit (being not shown in Fig. 4), for being greater than first preset time when the Mean Time Between Replacement
When threshold value, determining that question terminates mute time is the 4th mute time;
5th determination unit (being not shown in Fig. 4), for being less than first preset time when the Mean Time Between Replacement
Threshold value, and be greater than second preset time threshold when, determine question terminate mute time be the 5th mute time;
6th determination unit (being not shown in Fig. 4), for being less than second preset time when the Mean Time Between Replacement
When threshold value, determining that question terminates mute time is the 6th mute time.
Wherein, the 4th mute time is greater than the 5th mute time, and the 5th mute time is greater than described the
Six mute times.
As a kind of embodiment of the embodiment of the present invention, above-mentioned apparatus can also include:
It instructs respond module (being not shown in Fig. 4), is used for described according to the word speed information and preset mute time
Setting rule, after determining the step of question terminates mute time, when the intelligent sound terminal for detecting acquisition is currently adopted
When the corresponding mute time of the user voice signal of collection, which reaches identified question, terminates mute time, the current acquisition is responded
The corresponding user instruction of user voice signal.
Wherein, the user instruction is the instruction determined according to the semanteme of the user voice signal currently acquired.
The embodiment of the invention also provides a kind of electronic equipment, as shown in figure 5, include processor 501, communication interface 502,
Memory 503 and communication bus 504, wherein processor 501, communication interface 502, memory 503 are complete by communication bus 504
At mutual communication,
Memory 503, for storing computer program;
Processor 501 when for executing the program stored on memory 503, realizes following steps:
Obtain the user voice signal of intelligent sound terminal acquisition;
Determine the word speed information of the user voice signal, wherein the word speed information is to identify the user speech letter
Number word speed feature information;
According to the word speed information and preset mute time setting rule, determine that question terminates mute time.
As it can be seen that electronic equipment can obtain the acquisition of intelligent sound terminal first in scheme provided by the embodiment of the present invention
User voice signal, then determine user voice signal word speed information, finally according to word speed information and it is preset mute when
Between be arranged rule, determine question terminate mute time, wherein word speed information be identity user voice signal word speed feature letter
Breath.Determine that question terminates mute time using which, can reasonably asking a question according to the word speed feature-set of user, it is quiet to terminate
The sound time, for the user of different word speeds, intelligent sound terminal can also be responded accurately, greatly improve the sound of intelligent sound terminal
Answer accuracy and user experience.
The communication bus that above-mentioned electronic equipment is mentioned can be Peripheral Component Interconnect standard (Peripheral Component
Interconnect, PCI) bus or expanding the industrial standard structure (Extended Industry Standard
Architecture, EISA) bus etc..The communication bus can be divided into address bus, data/address bus, control bus etc..For just
It is only indicated with a thick line in expression, figure, it is not intended that an only bus or a type of bus.
Communication interface is for the communication between above-mentioned electronic equipment and other equipment.
Memory may include random access memory (Random Access Memory, RAM), also may include non-easy
The property lost memory (Non-Volatile Memory, NVM), for example, at least a magnetic disk storage.Optionally, memory may be used also
To be storage device that at least one is located remotely from aforementioned processor.
Above-mentioned processor can be general processor, including central processing unit (Central Processing Unit,
CPU), network processing unit (Network Processor, NP) etc.;It can also be digital signal processor (Digital Signal
Processing, DSP), it is specific integrated circuit (Application Specific Integrated Circuit, ASIC), existing
It is field programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic device, discrete
Door or transistor logic, discrete hardware components.
Wherein, the step of user voice signal that above-mentioned acquisition intelligent sound terminal acquires, may include:
The user voice signal of intelligent sound terminal acquisition is obtained in real time;
Before the step of word speed information of the above-mentioned determination user voice signal, may include:
The duration for monitoring the user voice signal reaches preset duration;
It is above-mentioned that rule is arranged according to the word speed information and preset mute time, determine that question terminates the step of mute time
Suddenly, may include:
According to the word speed information and preset mute time setting rule, the user voice signal pair currently obtained is determined
The question answered terminates mute time.
Wherein, above-mentioned word speed information can be average word speed;
The step of word speed information of the above-mentioned determination user voice signal, may include:
Obtain the duration of the user voice signal;
Speech recognition is carried out to the user voice signal, obtains the corresponding text quantity of the user voice signal;
According to the text quantity and the duration, the average word speed of the user voice signal is determined.
Wherein, above-mentioned that rule is arranged according to the word speed information and preset mute time, when determining that question terminates mute
Between the step of, may include:
According to the size relation of the average word speed and preset word speed threshold value, determine that question terminates mute time.
Wherein, above-mentioned preset word speed threshold value may include the first default word speed threshold value and the second default word speed threshold value,
In, the first default word speed threshold value is less than the second default word speed threshold value;
The above-mentioned size relation according to the average word speed and preset word speed threshold value, determines that question terminates mute time
Step may include:
When the average word speed is less than the described first default word speed threshold value, determining that question terminates mute time is first quiet
The sound time;
When the average word speed is greater than the described first default word speed threshold value, and is less than the described second default word speed threshold value,
Determining question to terminate mute time is the second mute time;
When the average word speed is greater than the described second default word speed threshold value, determining that question terminates mute time is that third is quiet
The sound time, wherein first mute time is greater than second mute time, and second mute time is greater than the third
Mute time.
Above-mentioned word speed information can be the Mean Time Between Replacement of word and word;
The step of word speed information of the above-mentioned determination user voice signal, may include:
Speech recognition is carried out to the user voice signal, obtains adjacent text in the corresponding text of the user voice signal
Interval time between word;
According to the interval time, the corresponding Mean Time Between Replacement of the user voice signal is calculated.
Wherein, above-mentioned that rule is arranged according to the word speed information and preset mute time, when determining that question terminates mute
Between the step of, may include:
According to the size relation of the Mean Time Between Replacement and preset time threshold, determine that question terminates mute time.
Wherein, above-mentioned preset time threshold may include the first preset time threshold and the second preset time threshold,
In, first preset time threshold is greater than the second preset time threshold;
The above-mentioned size relation according to the Mean Time Between Replacement and preset time threshold, when determining that question terminates mute
Between the step of, may include:
When the Mean Time Between Replacement is greater than first preset time threshold, determining that question terminates mute time is the
Four mute times;
When the Mean Time Between Replacement be less than first preset time threshold, and be greater than second preset time threshold
When, determining that question terminates mute time is the 5th mute time;
When the Mean Time Between Replacement is less than second preset time threshold, determining that question terminates mute time is the
Six mute times, wherein the 4th mute time is greater than the 5th mute time, and the 5th mute time is greater than described
6th mute time.
Wherein, rule is arranged according to the word speed information and preset mute time above-mentioned, it is mute determines that question terminates
After the step of time, the above method can also include:
When the corresponding mute time of user voice signal for detecting that the intelligent sound terminal of acquisition currently acquires reaches
When terminating mute time to identified question, the corresponding user instruction of user voice signal currently acquired is responded,
In, the user instruction is the instruction determined according to the semanteme of the user voice signal currently acquired.
The embodiment of the invention also provides a kind of computer readable storage medium, the computer readable storage medium memory
Computer program is contained, the computer program performs the steps of when being executed by processor
Obtain the user voice signal of intelligent sound terminal acquisition;
Determine the word speed information of the user voice signal, wherein the word speed information is to identify the user speech letter
Number word speed feature information;
According to the word speed information and preset mute time setting rule, determine that question terminates mute time.
As it can be seen that when computer program is executed by processor, obtaining intelligence first in scheme provided by the embodiment of the present invention
The user voice signal of voice terminal acquisition, then determines the word speed information of user voice signal, finally according to word speed information and
Preset mute time setting rule, determines that question terminates mute time, wherein word speed information is identity user voice signal
The information of word speed feature.Determine that question terminates mute time using which, it can be reasonable according to the word speed feature-set of user
Question terminate mute time, for the user of different word speeds, intelligent sound terminal can also be responded accurately, greatly improve intelligence
The response accuracy of voice terminal and user experience.
Wherein, the step of user voice signal that above-mentioned acquisition intelligent sound terminal acquires, may include:
The user voice signal of intelligent sound terminal acquisition is obtained in real time;
Before the step of word speed information of the above-mentioned determination user voice signal, may include:
The duration for monitoring the user voice signal reaches preset duration;
It is above-mentioned that rule is arranged according to the word speed information and preset mute time, determine that question terminates the step of mute time
Suddenly, may include:
According to the word speed information and preset mute time setting rule, the user voice signal pair currently obtained is determined
The question answered terminates mute time.
Wherein, above-mentioned word speed information can be average word speed;
The step of word speed information of the above-mentioned determination user voice signal, may include:
Obtain the duration of the user voice signal;
Speech recognition is carried out to the user voice signal, obtains the corresponding text quantity of the user voice signal;
According to the text quantity and the duration, the average word speed of the user voice signal is determined.
Wherein, above-mentioned that rule is arranged according to the word speed information and preset mute time, when determining that question terminates mute
Between the step of, may include:
According to the size relation of the average word speed and preset word speed threshold value, determine that question terminates mute time.
Wherein, above-mentioned preset word speed threshold value may include the first default word speed threshold value and the second default word speed threshold value,
In, the first default word speed threshold value is less than the second default word speed threshold value;
The above-mentioned size relation according to the average word speed and preset word speed threshold value, determines that question terminates mute time
Step may include:
When the average word speed is less than the described first default word speed threshold value, determining that question terminates mute time is first quiet
The sound time;
When the average word speed is greater than the described first default word speed threshold value, and is less than the described second default word speed threshold value,
Determining question to terminate mute time is the second mute time;
When the average word speed is greater than the described second default word speed threshold value, determining that question terminates mute time is that third is quiet
The sound time, wherein first mute time is greater than second mute time, and second mute time is greater than the third
Mute time.
Above-mentioned word speed information can be the Mean Time Between Replacement of word and word;
The step of word speed information of the above-mentioned determination user voice signal, may include:
Speech recognition is carried out to the user voice signal, obtains adjacent text in the corresponding text of the user voice signal
Interval time between word;
According to the interval time, the corresponding Mean Time Between Replacement of the user voice signal is calculated.
Wherein, above-mentioned that rule is arranged according to the word speed information and preset mute time, when determining that question terminates mute
Between the step of, may include:
According to the size relation of the Mean Time Between Replacement and preset time threshold, determine that question terminates mute time.
Wherein, above-mentioned preset time threshold may include the first preset time threshold and the second preset time threshold,
In, first preset time threshold is greater than the second preset time threshold;
The above-mentioned size relation according to the Mean Time Between Replacement and preset time threshold, when determining that question terminates mute
Between the step of, may include:
When the Mean Time Between Replacement is greater than first preset time threshold, determining that question terminates mute time is the
Four mute times;
When the Mean Time Between Replacement be less than first preset time threshold, and be greater than second preset time threshold
When, determining that question terminates mute time is the 5th mute time;
When the Mean Time Between Replacement is less than second preset time threshold, determining that question terminates mute time is the
Six mute times, wherein the 4th mute time is greater than the 5th mute time, and the 5th mute time is greater than described
6th mute time.
Wherein, rule is arranged according to the word speed information and preset mute time above-mentioned, it is mute determines that question terminates
After the step of time, the above method can also include:
When the corresponding mute time of user voice signal for detecting that the intelligent sound terminal of acquisition currently acquires reaches
When terminating mute time to identified question, the corresponding user instruction of user voice signal currently acquired is responded,
In, the user instruction is the instruction determined according to the semanteme of the user voice signal currently acquired.
It should be noted that for above-mentioned apparatus, electronic equipment and computer readable storage medium embodiment, due to
It is substantially similar to embodiment of the method, so being described relatively simple, related place is referring to the part explanation of embodiment of the method
It can.
Need further exist for explanation, herein, relational terms such as first and second and the like be used merely to by
One entity or operation are distinguished with another entity or operation, without necessarily requiring or implying these entities or operation
Between there are any actual relationship or orders.Moreover, the terms "include", "comprise" or its any other variant meaning
Covering non-exclusive inclusion, so that the process, method, article or equipment for including a series of elements not only includes that
A little elements, but also including other elements that are not explicitly listed, or further include for this process, method, article or
The intrinsic element of equipment.In the absence of more restrictions, the element limited by sentence "including a ...", is not arranged
Except there is also other identical elements in the process, method, article or apparatus that includes the element.
Each embodiment in this specification is all made of relevant mode and describes, same and similar portion between each embodiment
Dividing may refer to each other, and each embodiment focuses on the differences from other embodiments.Especially for system reality
For applying example, since it is substantially similar to the method embodiment, so being described relatively simple, related place is referring to embodiment of the method
Part explanation.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the scope of the present invention.It is all
Any modification, equivalent replacement, improvement and so within the spirit and principles in the present invention, are all contained in protection scope of the present invention
It is interior.
Claims (10)
1. a kind of determination method that question terminates mute time, which is characterized in that the described method includes:
Obtain the user voice signal of intelligent sound terminal acquisition;
Determine the word speed information of the user voice signal, wherein the word speed information is the mark user voice signal
The information of word speed feature;
According to the word speed information and preset mute time setting rule, determine that question terminates mute time.
2. the method as described in claim 1, which is characterized in that the user voice signal for obtaining the acquisition of intelligent sound terminal
The step of, comprising:
The user voice signal of intelligent sound terminal acquisition is obtained in real time;
Before the step of word speed information of the determination user voice signal, comprising:
The duration for monitoring the user voice signal reaches preset duration;
It is described that rule is arranged according to the word speed information and preset mute time, determine the step of question terminates mute time,
Include:
According to the word speed information and preset mute time setting rule, determine that the user voice signal currently obtained is corresponding
Question terminates mute time.
3. the method as described in claim 1, which is characterized in that the word speed information is average word speed;
The step of word speed information of the determination user voice signal, comprising:
Obtain the duration of the user voice signal;
Speech recognition is carried out to the user voice signal, obtains the corresponding text quantity of the user voice signal;
According to the text quantity and the duration, the average word speed of the user voice signal is determined.
4. method as claimed in claim 3, which is characterized in that described to be set according to the word speed information and preset mute time
Rule is set, determines the step of question terminates mute time, comprising:
According to the size relation of the average word speed and preset word speed threshold value, determine that question terminates mute time.
5. method as claimed in claim 4, which is characterized in that the preset word speed threshold value includes the first default word speed threshold value
And the second default word speed threshold value, wherein the first default word speed threshold value is less than the second default word speed threshold value;
The size relation according to the average word speed and preset word speed threshold value, determines that question terminates the step of mute time
Suddenly, comprising:
When the average word speed is less than the described first default word speed threshold value, when determining that question end mute time is first mute
Between;
When the average word speed is greater than the described first default word speed threshold value, and is less than the described second default word speed threshold value, determine
It is the second mute time that question, which terminates mute time,;
When the average word speed is greater than the described second default word speed threshold value, when determining that question end mute time is that third is mute
Between, wherein first mute time is greater than second mute time, and it is mute that second mute time is greater than the third
Time.
6. the method as described in claim 1, which is characterized in that the word speed information is the Mean Time Between Replacement of word and word;
The step of word speed information of the determination user voice signal, comprising:
Speech recognition is carried out to the user voice signal, obtain in the corresponding text of the user voice signal adjacent text it
Between interval time;
According to the interval time, the corresponding Mean Time Between Replacement of the user voice signal is calculated.
7. method as claimed in claim 6, which is characterized in that described to be set according to the word speed information and preset mute time
Rule is set, determines the step of question terminates mute time, comprising:
According to the size relation of the Mean Time Between Replacement and preset time threshold, determine that question terminates mute time.
8. the determining device that a kind of question terminates mute time, which is characterized in that described device includes:
Voice signal obtains module, for obtaining the user voice signal of intelligent sound terminal acquisition;
Word speed information determination module, for determining the word speed information of the user voice signal, wherein the word speed information is mark
Know the information of the word speed feature of the user voice signal;
Mute time determining module determines question knot for rule to be arranged according to the word speed information and preset mute time
Beam mute time.
9. a kind of electronic equipment, which is characterized in that including processor, memory and communication bus, wherein processor, memory
Mutual communication is completed by communication bus;
Memory, for storing computer program;
Processor when for executing the program stored on memory, realizes any method and step of claim 1-9.
10. a kind of computer readable storage medium, which is characterized in that be stored with computer in the computer readable storage medium
Program realizes claim 1-9 any method and step when the computer program is executed by processor.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810083491.9A CN108962283B (en) | 2018-01-29 | 2018-01-29 | Method and device for determining question end mute time and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810083491.9A CN108962283B (en) | 2018-01-29 | 2018-01-29 | Method and device for determining question end mute time and electronic equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108962283A true CN108962283A (en) | 2018-12-07 |
CN108962283B CN108962283B (en) | 2020-11-06 |
Family
ID=64495448
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810083491.9A Active CN108962283B (en) | 2018-01-29 | 2018-01-29 | Method and device for determining question end mute time and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108962283B (en) |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109599130A (en) * | 2018-12-10 | 2019-04-09 | 百度在线网络技术(北京)有限公司 | Reception method, device and storage medium |
CN109961787A (en) * | 2019-02-20 | 2019-07-02 | 北京小米移动软件有限公司 | Determine the method and device of acquisition end time |
CN109979474A (en) * | 2019-03-01 | 2019-07-05 | 珠海格力电器股份有限公司 | Speech ciphering equipment and its user speed modification method, device and storage medium |
CN110400576A (en) * | 2019-07-29 | 2019-11-01 | 北京声智科技有限公司 | The processing method and processing device of voice request |
CN110534109A (en) * | 2019-09-25 | 2019-12-03 | 深圳追一科技有限公司 | Audio recognition method, device, electronic equipment and storage medium |
CN110675861A (en) * | 2019-09-26 | 2020-01-10 | 深圳追一科技有限公司 | Method, device and equipment for speech sentence-breaking and storage medium |
CN111292739A (en) * | 2018-12-10 | 2020-06-16 | 珠海格力电器股份有限公司 | Voice control method and device, storage medium and air conditioner |
CN111402931A (en) * | 2020-03-05 | 2020-07-10 | 云知声智能科技股份有限公司 | Voice boundary detection method and system assisted by voice portrait |
CN112037775A (en) * | 2020-09-08 | 2020-12-04 | 北京嘀嘀无限科技发展有限公司 | Voice recognition method, device, equipment and storage medium |
CN112151073A (en) * | 2019-06-28 | 2020-12-29 | 北京声智科技有限公司 | Voice processing method, system, device and medium |
CN112397102A (en) * | 2019-08-14 | 2021-02-23 | 腾讯科技(深圳)有限公司 | Audio processing method and device and terminal |
CN112825248A (en) * | 2019-11-19 | 2021-05-21 | 阿里巴巴集团控股有限公司 | Voice processing method, model training method, interface display method and equipment |
CN113782010A (en) * | 2021-11-10 | 2021-12-10 | 北京沃丰时代数据科技有限公司 | Robot response method, device, electronic equipment and storage medium |
CN114203204A (en) * | 2021-12-06 | 2022-03-18 | 北京百度网讯科技有限公司 | Tail point detection method, device, equipment and storage medium |
Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5826230A (en) * | 1994-07-18 | 1998-10-20 | Matsushita Electric Industrial Co., Ltd. | Speech detection device |
US20020019736A1 (en) * | 2000-06-30 | 2002-02-14 | Hiroyuki Kimura | Voice synthesizing apparatus, voice synthesizing system, voice synthesizing method and storage medium |
US20070118363A1 (en) * | 2004-07-21 | 2007-05-24 | Fujitsu Limited | Voice speed control apparatus |
CN101031958A (en) * | 2005-06-15 | 2007-09-05 | Qnx软件操作系统(威美科)有限公司 | Speech end-pointer |
CN102543063A (en) * | 2011-12-07 | 2012-07-04 | 华南理工大学 | Method for estimating speech speed of multiple speakers based on segmentation and clustering of speakers |
JP2012128440A (en) * | 2012-02-06 | 2012-07-05 | Denso Corp | Voice interactive device |
CN103489454A (en) * | 2013-09-22 | 2014-01-01 | 浙江大学 | Voice endpoint detection method based on waveform morphological characteristic clustering |
CN103617801A (en) * | 2013-12-18 | 2014-03-05 | 联想(北京)有限公司 | Voice detection method and device and electronic equipment |
CN104715761A (en) * | 2013-12-16 | 2015-06-17 | 深圳百科信息技术有限公司 | Audio valid data detection methods and audio valid data detection system |
CN105139849A (en) * | 2015-07-22 | 2015-12-09 | 百度在线网络技术(北京)有限公司 | Speech recognition method and apparatus |
CN106782506A (en) * | 2016-11-23 | 2017-05-31 | 语联网(武汉)信息技术有限公司 | A kind of method that recorded audio is divided into section |
CN107077840A (en) * | 2014-10-20 | 2017-08-18 | 雅马哈株式会社 | Speech synthetic device and method |
CN107145329A (en) * | 2017-04-10 | 2017-09-08 | 北京猎户星空科技有限公司 | Apparatus control method, device and smart machine |
US20170345444A1 (en) * | 2016-05-31 | 2017-11-30 | Panasonic Intellectual Property Management Co., Ltd. | Communication apparatus mounted with speech speed conversion device |
-
2018
- 2018-01-29 CN CN201810083491.9A patent/CN108962283B/en active Active
Patent Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5826230A (en) * | 1994-07-18 | 1998-10-20 | Matsushita Electric Industrial Co., Ltd. | Speech detection device |
US20020019736A1 (en) * | 2000-06-30 | 2002-02-14 | Hiroyuki Kimura | Voice synthesizing apparatus, voice synthesizing system, voice synthesizing method and storage medium |
US20070118363A1 (en) * | 2004-07-21 | 2007-05-24 | Fujitsu Limited | Voice speed control apparatus |
CN101031958A (en) * | 2005-06-15 | 2007-09-05 | Qnx软件操作系统(威美科)有限公司 | Speech end-pointer |
CN102543063A (en) * | 2011-12-07 | 2012-07-04 | 华南理工大学 | Method for estimating speech speed of multiple speakers based on segmentation and clustering of speakers |
JP2012128440A (en) * | 2012-02-06 | 2012-07-05 | Denso Corp | Voice interactive device |
CN103489454A (en) * | 2013-09-22 | 2014-01-01 | 浙江大学 | Voice endpoint detection method based on waveform morphological characteristic clustering |
CN104715761A (en) * | 2013-12-16 | 2015-06-17 | 深圳百科信息技术有限公司 | Audio valid data detection methods and audio valid data detection system |
CN103617801A (en) * | 2013-12-18 | 2014-03-05 | 联想(北京)有限公司 | Voice detection method and device and electronic equipment |
CN107077840A (en) * | 2014-10-20 | 2017-08-18 | 雅马哈株式会社 | Speech synthetic device and method |
CN105139849A (en) * | 2015-07-22 | 2015-12-09 | 百度在线网络技术(北京)有限公司 | Speech recognition method and apparatus |
US20170345444A1 (en) * | 2016-05-31 | 2017-11-30 | Panasonic Intellectual Property Management Co., Ltd. | Communication apparatus mounted with speech speed conversion device |
CN106782506A (en) * | 2016-11-23 | 2017-05-31 | 语联网(武汉)信息技术有限公司 | A kind of method that recorded audio is divided into section |
CN107145329A (en) * | 2017-04-10 | 2017-09-08 | 北京猎户星空科技有限公司 | Apparatus control method, device and smart machine |
Cited By (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109599130A (en) * | 2018-12-10 | 2019-04-09 | 百度在线网络技术(北京)有限公司 | Reception method, device and storage medium |
CN111292739A (en) * | 2018-12-10 | 2020-06-16 | 珠海格力电器股份有限公司 | Voice control method and device, storage medium and air conditioner |
CN111292739B (en) * | 2018-12-10 | 2023-03-31 | 珠海格力电器股份有限公司 | Voice control method and device, storage medium and air conditioner |
CN109599130B (en) * | 2018-12-10 | 2020-10-30 | 百度在线网络技术(北京)有限公司 | Sound reception method, device and storage medium |
CN109961787A (en) * | 2019-02-20 | 2019-07-02 | 北京小米移动软件有限公司 | Determine the method and device of acquisition end time |
CN109979474A (en) * | 2019-03-01 | 2019-07-05 | 珠海格力电器股份有限公司 | Speech ciphering equipment and its user speed modification method, device and storage medium |
CN112151073A (en) * | 2019-06-28 | 2020-12-29 | 北京声智科技有限公司 | Voice processing method, system, device and medium |
CN110400576B (en) * | 2019-07-29 | 2021-10-15 | 北京声智科技有限公司 | Voice request processing method and device |
CN110400576A (en) * | 2019-07-29 | 2019-11-01 | 北京声智科技有限公司 | The processing method and processing device of voice request |
CN112397102B (en) * | 2019-08-14 | 2022-07-08 | 腾讯科技(深圳)有限公司 | Audio processing method and device and terminal |
CN112397102A (en) * | 2019-08-14 | 2021-02-23 | 腾讯科技(深圳)有限公司 | Audio processing method and device and terminal |
CN110534109A (en) * | 2019-09-25 | 2019-12-03 | 深圳追一科技有限公司 | Audio recognition method, device, electronic equipment and storage medium |
CN110675861A (en) * | 2019-09-26 | 2020-01-10 | 深圳追一科技有限公司 | Method, device and equipment for speech sentence-breaking and storage medium |
CN112825248A (en) * | 2019-11-19 | 2021-05-21 | 阿里巴巴集团控股有限公司 | Voice processing method, model training method, interface display method and equipment |
CN111402931A (en) * | 2020-03-05 | 2020-07-10 | 云知声智能科技股份有限公司 | Voice boundary detection method and system assisted by voice portrait |
CN111402931B (en) * | 2020-03-05 | 2023-05-26 | 云知声智能科技股份有限公司 | Voice boundary detection method and system assisted by sound image |
CN112037775A (en) * | 2020-09-08 | 2020-12-04 | 北京嘀嘀无限科技发展有限公司 | Voice recognition method, device, equipment and storage medium |
CN113782010A (en) * | 2021-11-10 | 2021-12-10 | 北京沃丰时代数据科技有限公司 | Robot response method, device, electronic equipment and storage medium |
CN114203204A (en) * | 2021-12-06 | 2022-03-18 | 北京百度网讯科技有限公司 | Tail point detection method, device, equipment and storage medium |
CN114203204B (en) * | 2021-12-06 | 2024-04-05 | 北京百度网讯科技有限公司 | Tail point detection method, device, equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN108962283B (en) | 2020-11-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108962283A (en) | A kind of question terminates the determination method, apparatus and electronic equipment of mute time | |
CN107919130B (en) | Cloud-based voice processing method and device | |
US11127416B2 (en) | Method and apparatus for voice activity detection | |
CN108509619B (en) | Voice interaction method and device | |
CN108877778B (en) | Sound end detecting method and equipment | |
JP6420306B2 (en) | Speech end pointing | |
CN111429895B (en) | Semantic understanding method and device for multi-round interaction and computer storage medium | |
CN108958810A (en) | A kind of user identification method based on vocal print, device and equipment | |
CN105336324A (en) | Language identification method and device | |
CN108874904A (en) | Speech message searching method, device, computer equipment and storage medium | |
CN108810642A (en) | A kind of barrage display methods, device and electronic equipment | |
CN109697981B (en) | Voice interaction method, device, equipment and storage medium | |
CN109119070A (en) | A kind of sound end detecting method, device, equipment and storage medium | |
EP3739583A1 (en) | Dialog device, dialog method, and dialog computer program | |
CN110277092A (en) | A kind of voice broadcast method, device, electronic equipment and readable storage medium storing program for executing | |
CN111145733A (en) | Speech recognition method, speech recognition device, computer equipment and computer readable storage medium | |
CN112256229B (en) | Man-machine voice interaction method and device, electronic equipment and storage medium | |
US20120053937A1 (en) | Generalizing text content summary from speech content | |
CN117253478A (en) | Voice interaction method and related device | |
CN111933149A (en) | Voice interaction method, wearable device, terminal and voice interaction system | |
CN110324566B (en) | Method, device and equipment for testing sound delay in video conference | |
CN111063356B (en) | Electronic equipment response method and system, sound box and computer readable storage medium | |
CN112242135A (en) | Voice data processing method and intelligent customer service device | |
CN109377993A (en) | Intelligent voice system and its voice awakening method and intelligent sound equipment | |
CN112151034B (en) | Voice control method and device of equipment, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |