CN105719670B - A kind of audio-frequency processing method and device towards intelligent robot - Google Patents
A kind of audio-frequency processing method and device towards intelligent robot Download PDFInfo
- Publication number
- CN105719670B CN105719670B CN201610028052.9A CN201610028052A CN105719670B CN 105719670 B CN105719670 B CN 105719670B CN 201610028052 A CN201610028052 A CN 201610028052A CN 105719670 B CN105719670 B CN 105719670B
- Authority
- CN
- China
- Prior art keywords
- individual character
- audio
- character time
- recording
- time
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 15
- 230000010365 information processing Effects 0.000 claims abstract description 17
- 235000013399 edible fruits Nutrition 0.000 claims description 5
- 230000004044 response Effects 0.000 abstract description 6
- 238000000034 method Methods 0.000 description 26
- 230000008569 process Effects 0.000 description 23
- 230000009467 reduction Effects 0.000 description 10
- 238000005516 engineering process Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 3
- 238000001514 detection method Methods 0.000 description 2
- 230000004069 differentiation Effects 0.000 description 2
- 238000011946 reduction process Methods 0.000 description 2
- 235000015170 shellfish Nutrition 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000002045 lasting effect Effects 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 239000002245 particle Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B19/00—Driving, starting, stopping record carriers not specifically of filamentary or web form, or of supports therefor; Control thereof; Control of operating function ; Driving both disc and head
- G11B19/02—Control of operating function, e.g. switching from recording to reproducing
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B20/00—Signal processing not specific to the method of recording or reproducing; Circuits therefor
- G11B20/10—Digital recording or reproducing
- G11B20/10527—Audio or video recording; Data buffering arrangements
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B20/00—Signal processing not specific to the method of recording or reproducing; Circuits therefor
- G11B20/10—Digital recording or reproducing
- G11B20/10527—Audio or video recording; Data buffering arrangements
- G11B2020/10537—Audio or video recording
- G11B2020/10546—Audio or video recording specifically adapted for audio data
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Signal Processing (AREA)
- Toys (AREA)
Abstract
The invention discloses a kind of audio-frequency processing method and device towards intelligent robot, the audio-frequency processing method includes audio-frequency information acquisition step, the audio-frequency information of collection user's input;Audio-frequency information processing step, is pre-processed to audio-frequency information, obtains record length data, and record length data include average individual character time t3 and maximum individual character time t4;Natural language understanding step, the word in audio-frequency information is parsed, obtains natural language understanding result;Record length judgment step, average individual character time t3, maximum individual character time t4, zero volume duration t5 and natural language understanding result are judged.The present invention can optimize robot answer output opportunity, improve response accuracy.
Description
Technical field
The present invention relates to speech recognition and processing technology field, specifically, is related to a kind of sound towards intelligent robot
Frequency treating method and apparatus.
Background technology
Intelligent robot is the aggregate of an a variety of new and high technologies, and it has merged machinery, electronics, sensor, computer
Hardware, software, artificial intelligence etc. are permitted multi-disciplinary knowledge, are related to the technology of current many Disciplinary Frontiers.
In intelligent robot with user interaction process, generally first presetting a set time, in recording, detection is used
Whether the family silent time has reached this default set time.If having reached the default set time, stop
Recording.
But stop the mode of recording above by the default set time, it may appear that End of Tape opportunity inaccuracy is asked
Topic, and then intelligent robot answer output opportunity is influenceed, reduce response time accuracy and Consumer's Experience.
The content of the invention
To solve problem above, the invention provides a kind of audio-frequency processing method and device towards intelligent robot, uses
To optimize robot answer output opportunity, response accuracy is improved.
According to an aspect of the invention, there is provided a kind of audio-frequency processing method towards intelligent robot, including:
Audio-frequency information acquisition step, the audio-frequency information of collection user's input;
Audio-frequency information processing step, the audio-frequency information is pre-processed, obtain record length data, during the recording
Between data include average individual character time t3 and maximum individual character time t4;
Natural language understanding step, the word in the audio-frequency information is parsed, obtain natural language understanding result;
Record length judgment step, when continuing to average individual character time t3, maximum individual character time t4, zero volume
Between t5 and natural language understanding result judged that, when judged result, which meets, terminates recording conditions, generation terminates recording instruction.
According to one embodiment of present invention, record length judgment step includes:
Zero volume duration t5 and preset audio end time t0 are compared, works as t5>During t0, terminate recording;
Zero volume duration t5 and average individual character time t3 are compared, works as t5>T3 and the natural language understanding knot
When fruit indicates End of Tape, terminate recording;
Zero volume duration t5 and the maximum individual character time t4 are compared, works as t5>Terminate to record during t4, and, adjustment
T0 values level off to maximum individual character time t4.
According to one embodiment of present invention, the maximum individual character time t4 is obtained, including:
In single recording, according to the word number for having a volume duration t1 and speech recognition obtains, single record is calculated
Individual character time t2 in sound;
According to the individual character time t2 that all singles are recorded in continuous n times recording, maximum individual character time t4 is obtained.
According to one embodiment of present invention, average individual character time t3 is obtained, including:
In single recording, according to the word number for having a volume duration t1 and speech recognition obtains, single record is calculated
Individual character time t2 in sound;
According to the individual character time t2 that all singles are recorded in continuous n times recording, average individual character time t3 is obtained.
According to one embodiment of present invention, individual character time t2 is calculated by following formula:
T2=t1/a or t2=(t1/a+t1/ (a-1))/2
Wherein, a is to have the word number identified in volume duration t1.
According to another aspect of the present invention, a kind of apparatus for processing audio towards intelligent robot is additionally provided, including:
Audio-frequency information acquisition module, the audio-frequency information of collection user's input;
Audio-frequency information processing module, the audio-frequency information is pre-processed, obtain record length data, during the recording
Between data include average individual character time t3 and maximum individual character time t4;
Natural language understanding module, the word in the audio-frequency information is parsed, obtain natural language understanding result.
Record length judge module, when continuing to average individual character time t3, maximum individual character time t4, zero volume
Between t5 and natural language understanding result judged that, when judged result, which meets, terminates recording conditions, generation terminates recording instruction.
According to one embodiment of present invention, the record length judge module is used for:
Zero volume duration t5 and preset audio end time t0 are compared, works as t5>During t0, terminate recording;
Zero volume duration t5 and average individual character time t3 are compared, works as t5>T3 and the natural language understanding knot
When fruit indicates End of Tape, terminate recording;
Zero volume duration t5 and the maximum individual character time t4 are compared, works as t5>Terminate to record during t4, and, adjustment
T0 values level off to maximum individual character time t4.
According to one embodiment of present invention, the audio-frequency information processing module includes:
First individual character time calculating unit, in single recording, obtained according to having volume duration t1 and speech recognition
Word number, calculate single recording in individual character time t2;
Maximum individual character time calculating unit, according to the individual character time t2 that all singles are recorded in continuous n times recording, obtain institute
State maximum individual character time t4.
According to one embodiment of present invention, the audio-frequency information processing module includes:
Second individual character time calculating unit, in single recording, obtained according to having volume duration t1 and speech recognition
Word number, calculate single recording in individual character time t2;
Average individual character time calculating unit, according to the individual character time t2 that all singles are recorded in continuous n times recording, obtain institute
State average individual character time t3.
Additionally provides a kind of apparatus for processing audio towards intelligent robot according to a further aspect of the invention, including:
Audio-frequency information Acquisition Circuit, the audio-frequency information of collection user's input;
Processor, the audio-frequency information is pre-processed, obtain record length data, the record length data include
Average individual character time t3 and maximum individual character time t4,
The word in the audio-frequency information is parsed, obtains natural language understanding result,
To average individual character time t3, maximum individual character time t4, zero volume duration t5 and natural language reason
Solution result is judged, when judged result, which meets, terminates recording conditions, generation terminates recording instruction,
Wherein, the processor is to average individual character time t3, maximum individual character time t4, zero volume duration
T5 and natural language understanding result judged, including:
Zero volume duration t5 and preset audio end time t0 are compared, works as t5>During t0, terminate recording;
Zero volume duration t5 and average individual character time t3 are compared, works as t5>T3 and the natural language understanding knot
When fruit indicates End of Tape, terminate recording;
Zero volume duration t5 and the maximum individual character time t4 are compared, works as t5>Terminate to record during t4, and, adjustment
T0 values level off to maximum individual character time t4.
Beneficial effects of the present invention:
A kind of audio-frequency processing method and device towards intelligent robot provided by the invention, by judging multiple sign languages
The parameter of speed, by the judgement to parameter, recording stopping opportunity accurately being controlled, and according to different user speak word speed and
Words and phrases interval carries out word speed study for individual consumer, so as to optimize robot answer output opportunity, improves response accuracy.
Other features and advantages of the present invention will be illustrated in the following description, also, partly becomes from specification
Obtain it is clear that or being understood by implementing the present invention.The purpose of the present invention and other advantages can be by specification, rights
Specifically noted structure is realized and obtained in claim and accompanying drawing.
Brief description of the drawings
In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing
There is the accompanying drawing required in technology description to do simple introduction:
Fig. 1 is method flow diagram according to an embodiment of the invention;
Fig. 2 is that average individual character time t3 according to an embodiment of the invention determines flow chart of steps;
Fig. 3 is maximum individual character time t4 according to an embodiment of the invention to determine flow chart of steps;
Fig. 4 is a kind of apparatus for processing audio structural representation towards intelligent robot according to an embodiment of the invention
Figure;
Fig. 5 is the structure that the maximum individual character time is determined in audio-frequency information processing module according to an embodiment of the invention
Schematic diagram;
Fig. 6 is the structure that the average individual character time is determined in audio-frequency information processing module according to an embodiment of the invention
Schematic diagram;And
Fig. 7 is that a kind of apparatus for processing audio sound intermediate frequency towards intelligent robot according to an embodiment of the invention is believed
Cease processing module structural representation.
Embodiment
Embodiments of the present invention are described in detail below with reference to drawings and Examples, and how the present invention is applied whereby
Technological means solves technical problem, and the implementation process for reaching technique effect can fully understand and implement according to this.Need to illustrate
As long as not forming conflict, each embodiment in the present invention and each feature in each embodiment can be combined with each other,
The technical scheme formed is within protection scope of the present invention.
It is a kind of audio-frequency processing method stream towards intelligent robot according to one embodiment of the present of invention as shown in Figure 1
Cheng Tu, below with reference to Fig. 1, the present invention is described in detail.
First, step S110, audio-frequency information acquisition step, that is, the audio-frequency information that user inputs is gathered.Specifically, in the step
In rapid, when user speaks, intelligent robot starts the voice messaging that collection receives user.
Followed by step S120, audio-frequency information processing step, i.e., the audio-frequency information of reception is pre-processed, recorded
Sound time data.The record length data include average individual character time t3 and maximum individual character time t4.
What average individual character time t3 represented is the time rested on when user speaks on individual character.It is determined that user speak it is flat
During equal individual character time t3, obtained by the mean value calculation of the continuous n recorded message of user.
Specifically, including following steps as shown in Figure 2:
It is step S210 first, in single recording, according to the word for having a volume duration t1 and speech recognition obtains
Number, calculate the individual character time t2 in single recording.It is to pass through herein it should be noted that in n Recording Process above
User set in advance speaks the end time what (end of speech, EOS) distinguished each Recording Process.
When the EOS times refer to user recording, the time that the last character records to program stopped is finished from user.Should
The predeterminable EOS times are a certain fixed value t0, such as 3 seconds.By setting the EOS times can be to above continuous n times Recording Process
Make a distinction.
After EOS time t0s are set, then the single Recording Process after differentiation can be handled.Recorded in single
In, the concrete numerical value that is changed over time based on volume is calculated holds from the time volume for beginning with volume and once occurring volume to the end
Continuous time t1 (having the volume duration), and count the word come out in this section of time t1 by continuous speech recognition
Number.
Volume herein refers to the volume after carrying out noise reduction process of being spoken to user, excludes noise reduction algorithm to being lost after noise reduction
The volume of noise is stayed, that is, assumes that noise reduction is perfect noise reduction, it is believed that if user does not speak after noise reduction, volume is equal to zero.
Specifically, the concrete numerical value that volume changes over time in record single Recording Process, such as the 1st millisecond 30 decibels, the 2nd millisecond 70 points
Shellfish etc..The concrete numerical value changed over time by detecting volume can be calculated in this time recording from beginning with volume to the end
Once there is the time t1 of volume.Meanwhile continuous speech recognition is carried out in single Recording Process, and count and recorded in the single
During continuous speech recognition come out word number.
After the word number for having volume duration t1 and identifying determines, the individual character time in single recording is calculated
t2。
Specifically, the individual character time t2 in single recording can be calculated by following formula:
T2=t1/a (1)
Wherein, t2 is the individual character time in single recording, and t1 is to have the volume duration in single recording, and a is single
There is the word number that continuous speech recognition comes out in volume duration t1 in Recording Process.
Or can also be calculated by following formula single recording in individual character time t2:
T2=(t1/a+t1/ (a-1))/2 (2)
The average value of individual character time in single Recording Process is calculated by formula (2), can obtain what is spoken closer to user
The individual character time.
Followed by step S220, according to the individual character time t2 that all singles are recorded in continuous n times recording, average individual character is obtained
Time t3.
The individual character time that all singles in continuous n times recording are recorded, calculate user and speak average individual character time t3.
Specifically, after continuous n recording, the average value t3 of all individual character times t2 in calculating n times.N herein can preferably one it is pre-
If value, such as 10.
It is noted that in single Recording Process, when detecting that the duration t5 that volume is zero reaches predetermined value t0
When, then it is considered as user and no longer speaks, this End of Tape.Also, when volume occur in Recording Process be zero, then with the volume
Clocked to start from scratch on the basis of the null value moment as t5.When it is zero volume occur again in Recording Process, then with this again
Volume is to start from scratch again and clock on the basis of the null value moment, i.e. t5 starts from scratch timing again.It is this way it is secured that each defeated
Go out between word the duration for being zero and speak terminate after volume be zero duration accuracy.
Maximum individual character time t4 refers to user in continuous n Recording Process, the maximum of individual character time when speaking.
When determining the maximum individual character time t4 that user speaks, obtained by all individual character times of the continuous n recorded message of counting user
Arrive.
Specifically, including following steps as shown in Figure 3:
It is step S310 first, in single recording, according to the word for having a volume duration t1 and speech recognition obtains
Number, calculate the individual character time t2 in single recording.
It is at the end of being spoken by user set in advance herein it should be noted that in n Recording Process above
Between (end of speech, EOS) each Recording Process is distinguished, the predeterminable EOS times are a certain fixed value t0,
By setting the EOS times to be made a distinction to above continuous n times Recording Process.
After EOS time t0s are set, then the single Recording Process after differentiation can be handled.Recorded in single
In, the concrete numerical value that is changed over time based on volume is calculated holds from the time volume for beginning with volume and once occurring volume to the end
Continuous time t1 (having the volume duration), and count the word come out in this section of time t1 by continuous speech recognition
Number.
Volume herein refers to the volume after carrying out noise reduction process of being spoken to user, excludes noise reduction algorithm to being lost after noise reduction
The volume of noise is stayed, that is, assumes that noise reduction is perfect noise reduction, it is believed that if user does not speak after noise reduction, volume is equal to zero.
Specifically, the concrete numerical value that volume changes over time in record single Recording Process, such as the 1st millisecond 30 decibels, the 2nd millisecond 70 points
Shellfish etc..The concrete numerical value changed over time by detecting volume can be calculated in this time recording from beginning with volume to the end
Once there is the time t1 of volume.Meanwhile continuous speech recognition is carried out in single Recording Process, and count and recorded in the single
During continuous speech recognition come out word number.
Then the individual character time t2 in single recording is calculated.Specifically, the individual character time in single recording can pass through formula
(1) or formula (2) is calculated.
Finally, in step s 320, according to the individual character time t2 that all singles are recorded in continuous n times recording, user is obtained
Speak maximum individual character time t4.
Specifically, the individual character time t2 that all singles in continuous n times recording are recorded, when selecting the individual character of maximum
Between, as maximum individual character time t4.N herein can a preferably preset value.
Followed by natural language understanding step 130, the word in audio-frequency information is parsed, obtains natural language understanding knot
Fruit.
It is finally step S140, record length judgment step, to average individual character time t3, maximum individual character time t4, zero sound
Amount duration t5 and natural language understanding result are judged.
Specifically, in (n+1)th recording, zero volume duration t5 and preset audio end time t0, such as zero are compared
Volume duration t5 is more than preset audio end time t0, then terminates to record;As zero volume duration t5 is more than average list
Word time t3, and right language understanding NLU thinks End of Tape during natural language understanding result instruction End of Tape, then terminates
Recording;Such as zero volume duration t5 is more than average individual character time t3, and natural language understanding NLU is not considered as End of Tape, than
To zero volume duration t5 and maximum individual character time t4, work as t5>Terminate to record during t4, and, adjustment t0 values level off to t4.
Specifically, in this step, in (n+1)th recording, the situation for terminating recording is divided into three kinds.One kind therein is
Such as detect that zero volume duration t5 is more than preset audio end time t0, then terminate to record.The preset audio end time
T0 is set before Recording Process starts.
What preset audio end time t0 can be set is more than user speed and words and phrases interval.Now, exported to improve response
On opportunity, at (n+1)th time and in later lasting detection process, its value of preset audio end time t0 gradually reduces maximum with convergence
Individual character time t4.
Second of situation is such as to detect that zero volume duration t5 is more than average individual character time t3, and natural language is managed
When right language understanding NLU thinks End of Tape during solution result instruction End of Tape, then terminate to record.Specifically, when user is n-th
During+1 use speech recognition, if zero volume duration t5 in (n+1)th recorded message being calculated is more than averagely
Individual character time t3, then the natural language understanding result based on recorded message judge whether terminate recording.
If natural language understanding result thinks that user has finished words, that is, find modal particle and punctuate that user speaks
“" etc., then terminate to record.
Otherwise, if natural language understanding result thinks that user not yet finishes words, as cutting word finds last vocabulary not
Completely, then zero volume duration t5 and maximum individual character time t4 is compared, works as t5 into the third situation>Terminate to record during t4,
And adjustment t0 values level off to the t4.
It is noted that for different users, preset audio end time t0 and user speak the big of individual character time value
Small relation is indefinite, and its value is likely larger than user and spoken the individual character time, it is also possible to is spoken the individual character time less than user.In preceding continuous n
In secondary Recording Process, because not maximum individual character time t4 and average individual character time t3 refer to, detecting that zero volume continues
Time t5 is more than preset audio end time t0, then terminates to record.
A kind of audio-frequency processing method and device towards intelligent robot that the present embodiment provides, by judging multiple signs
The parameter of word speed, by the judgement to parameter, recording stopping opportunity accurately being controlled, and spoken word speed according to different user
And words and phrases interval carries out word speed study for individual consumer, so as to optimize robot answer output opportunity, response accuracy is improved.
According to another aspect of the present invention, a kind of apparatus for processing audio towards intelligent robot is additionally provided.Such as Fig. 4
Shown, the apparatus for processing audio includes audio-frequency information acquisition module, audio-frequency information processing module and record length judge module.
Wherein, audio-frequency information acquisition module, the audio-frequency information of user's input is received for gathering;
Audio-frequency information processing module, is pre-processed to audio-frequency information, obtains record length data, record length packet
Include average individual character time t3 and maximum individual character time t4;
Natural language understanding module, the word in the audio-frequency information is parsed, obtain natural language understanding result;
Record length judge module, when continuing to average individual character time t3, maximum individual character time t4, zero volume
Between t5 and natural language understanding result judged.
Record length judge module is carried out to average individual character time t3, maximum individual character time t4 and zero volume duration t5
Judge, generation terminates to include following several situations during recording instruction.At the end of comparing zero volume duration t5 and preset audio
Between t0, work as t5>During t0, terminate recording;Zero volume duration t5 and average individual character time t3 are compared, works as t5>T3 and institute
When stating natural language understanding result instruction End of Tape, terminate recording;Compare zero volume duration t5 and the maximum individual character
Time t4, works as t5>Terminate to record during t4, and, adjustment t0 values level off to the t4.
In one embodiment of the invention, audio-frequency information processing module includes the first individual character time calculating unit and maximum
Individual character time calculating unit, as shown in Figure 5.
Wherein, the first individual character time calculating unit is in single recording, according to having volume duration t1 and speech recognition
Obtained word number, calculate the individual character time t2 in single recording;Maximum individual character time calculating unit, according to continuous n times record
The individual character time t2 of all single recording, obtains maximum individual character time t4 in sound.
In one embodiment of the invention, audio-frequency information processing module includes the second individual character time calculating unit and is averaged
Individual character time calculating unit.
Wherein, the second individual character time calculating unit is in single recording, according to having volume duration t1 and speech recognition
Obtained word number, calculate the individual character time t2 in single recording;Average individual character time calculating unit is according to continuous n times recording
In the recording of all singles individual character time t2, obtain average individual character time t3, as shown in Figure 6.
Analyzed more than and Fig. 5 and Fig. 6, the first individual character time calculating unit and in audio-frequency information processing module
The function that two individual character time calculating units are completed is identical, therefore, when being designed, can calculate for the first individual character time
Unit and the second individual character time calculating unit merge into an individual character time calculating unit, then after individual character time calculating unit
Maximum individual character time calculating unit and average individual character time computing module are set respectively, as shown in Figure 7.
According to a further aspect of the invention, a kind of apparatus for processing audio towards intelligent robot, the sound are additionally provided
Frequency processing device includes audio-frequency information Acquisition Circuit and processor.
Wherein, audio-frequency information Acquisition Circuit is used for the recorded message for gathering user's input;
Processor, audio-frequency information is pre-processed, obtain record length data, record length data include average individual character
Time t3 and maximum individual character time t4, to average individual character time t3, maximum individual character time t4, zero volume duration
T5 and natural language understanding result are judged.
Specifically, processor is sentenced to average individual character time t3, maximum individual character time t4 and zero volume duration t5
Disconnected, the situation of generation End of Tape instruction is including following several:
(1) zero volume duration t5 and preset audio end time t0 is compared, works as t5>During t0, terminate recording;
(2) zero volume duration t5 and average individual character time t3 are compared, works as t5>T3 and the natural language understanding
When as a result indicating End of Tape, terminate recording;
(3) the zero volume duration t5 and maximum individual character time t4 is compared, works as t5>Terminate to record during t4, and, adjust
Whole t0 values level off to the t4.
In summary, a kind of audio-frequency processing method and device towards intelligent robot provided by the invention, pass through judgement
Multiple parameters for characterizing word speed, by the judgement to parameter, recording stopping opportunity accurately being controlled, and according to different user
Word speed of speaking and words and phrases interval carry out word speed study for individual consumer, and so as to optimize robot answer output opportunity, improving should
Answer accuracy.
While it is disclosed that embodiment as above, but described content only to facilitate understand the present invention and adopt
Embodiment, it is not limited to the present invention.Any those skilled in the art to which this invention pertains, this is not being departed from
On the premise of the disclosed spirit and scope of invention, any modification and change can be made in the implementing form and in details,
But the scope of patent protection of the present invention, still should be subject to the scope of the claims as defined in the appended claims.
Claims (10)
1. a kind of audio-frequency processing method towards intelligent robot, including:
Audio-frequency information acquisition step, the audio-frequency information of collection user's input;
Audio-frequency information processing step, the audio-frequency information is pre-processed, obtain record length data, the record length number
According to including average individual character time t3 and maximum individual character time t4;
Natural language understanding step, the word in the audio-frequency information is parsed, obtain natural language understanding result;
Record length judgment step, to average individual character time t3, maximum individual character time t4, zero volume duration t5
Judged with natural language understanding result, when judged result, which meets, terminates recording conditions, generation terminates recording instruction.
2. audio-frequency processing method according to claim 1, it is characterised in that the record length judgment step, including:
Zero volume duration t5 and preset audio end time t0 are compared, works as t5>During t0, terminate recording;
Zero volume duration t5 and average individual character time t3 are compared, works as t5>The t3 and natural language understanding result refers to
When showing End of Tape, terminate recording;
Zero volume duration t5 and the maximum individual character time t4 are compared, works as t5>Terminate to record during t4, and, adjust t0 values
Its value is set gradually to reduce to level off to maximum individual character time t4.
3. audio-frequency processing method according to claim 2, it is characterised in that the maximum individual character time t4 is obtained, including:
In single recording, according to the word number for having a volume duration t1 and speech recognition obtains, calculate in single recording
Individual character time t2;
According to the individual character time t2 that all singles are recorded in continuous n times recording, maximum individual character time t4 is obtained.
4. audio-frequency processing method according to claim 2, it is characterised in that average individual character time t3 is obtained, including:
In single recording, according to the word number for having a volume duration t1 and speech recognition obtains, calculate in single recording
Individual character time t2;
According to the individual character time t2 that all singles are recorded in continuous n times recording, average individual character time t3 is obtained.
5. the audio-frequency processing method according to claim 3 or 4, it is characterised in that individual character time t2 passes through following formula meter
Obtain:
T2=t1/a or t2=(t1/a+t1/ (a-1))/2
Wherein, a is to have the word number identified in volume duration t1.
6. a kind of apparatus for processing audio towards intelligent robot, including:
Audio-frequency information acquisition module, the audio-frequency information of collection user's input;
Audio-frequency information processing module, the audio-frequency information is pre-processed, obtain record length data, the record length number
According to including average individual character time t3 and maximum individual character time t4;
Natural language understanding module, the word in the audio-frequency information is parsed, obtain natural language understanding result;
Record length judge module, to average individual character time t3, maximum individual character time t4, zero volume duration t5
Judged with natural language understanding result, when judged result, which meets, terminates recording conditions, generation terminates recording instruction.
7. apparatus for processing audio according to claim 6, it is characterised in that the record length judge module, be used for:
Zero volume duration t5 and preset audio end time t0 are compared, works as t5>During t0, terminate recording;
Zero volume duration t5 and average individual character time t3 are compared, works as t5>The t3 and natural language understanding result refers to
When showing End of Tape, terminate recording;
Zero volume duration t5 and the maximum individual character time t4 are compared, works as t5>Terminate to record during t4, and, adjust t0 values
Its value is set gradually to reduce to level off to maximum individual character time t4.
8. apparatus for processing audio according to claim 7, it is characterised in that the audio-frequency information processing module includes:
First individual character time calculating unit, in single recording, according to the text for having a volume duration t1 and speech recognition obtains
Word number, calculate the individual character time t2 in single recording;
Maximum individual character time calculating unit, according to the individual character time t2 that all singles are recorded in continuous n times recording, obtain described in most
Big individual character time t4.
9. apparatus for processing audio according to claim 7, it is characterised in that the audio-frequency information processing module includes:
Second individual character time calculating unit, in single recording, according to the text for having a volume duration t1 and speech recognition obtains
Word number, calculate the individual character time t2 in single recording;
Average individual character time calculating unit, according to the individual character time t2 that all singles are recorded in continuous n times recording, obtain described flat
Equal individual character time t3.
10. a kind of apparatus for processing audio towards intelligent robot, including:
Audio-frequency information Acquisition Circuit, the audio-frequency information of collection user's input;
Processor, the audio-frequency information is pre-processed, obtain record length data, the record length data include average
Individual character time t3 and maximum individual character time t4,
The word in the audio-frequency information is parsed, obtains natural language understanding result,
To average individual character time t3, maximum individual character time t4, zero volume duration t5 and the natural language understanding knot
Fruit is judged, when judged result, which meets, terminates recording conditions, generation terminates recording instruction,
Wherein, the processor to average individual character time t3, maximum individual character time t4, zero volume duration t5 and
Natural language understanding result judged, including:
Zero volume duration t5 and preset audio end time t0 are compared, works as t5>During t0, terminate recording;
Zero volume duration t5 and average individual character time t3 are compared, works as t5>The t3 and natural language understanding result refers to
When showing End of Tape, terminate recording;
Zero volume duration t5 and the maximum individual character time t4 are compared, works as t5>Terminate to record during t4, and, adjust t0 values
Its value is set gradually to reduce to level off to maximum individual character time t4.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610028052.9A CN105719670B (en) | 2016-01-15 | 2016-01-15 | A kind of audio-frequency processing method and device towards intelligent robot |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610028052.9A CN105719670B (en) | 2016-01-15 | 2016-01-15 | A kind of audio-frequency processing method and device towards intelligent robot |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105719670A CN105719670A (en) | 2016-06-29 |
CN105719670B true CN105719670B (en) | 2018-02-06 |
Family
ID=56147155
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610028052.9A Active CN105719670B (en) | 2016-01-15 | 2016-01-15 | A kind of audio-frequency processing method and device towards intelligent robot |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105719670B (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101246687A (en) * | 2008-03-20 | 2008-08-20 | 北京航空航天大学 | Intelligent voice interaction system and method thereof |
CN104253902A (en) * | 2014-07-21 | 2014-12-31 | 宋婉毓 | Method for voice interaction with intelligent voice device |
CN105144286A (en) * | 2013-03-14 | 2015-12-09 | 托伊托克有限公司 | Systems and methods for interactive synthetic character dialogue |
CN105204628A (en) * | 2015-09-01 | 2015-12-30 | 涂悦 | Voice control method based on visual awakening |
-
2016
- 2016-01-15 CN CN201610028052.9A patent/CN105719670B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101246687A (en) * | 2008-03-20 | 2008-08-20 | 北京航空航天大学 | Intelligent voice interaction system and method thereof |
CN105144286A (en) * | 2013-03-14 | 2015-12-09 | 托伊托克有限公司 | Systems and methods for interactive synthetic character dialogue |
CN104253902A (en) * | 2014-07-21 | 2014-12-31 | 宋婉毓 | Method for voice interaction with intelligent voice device |
CN105204628A (en) * | 2015-09-01 | 2015-12-30 | 涂悦 | Voice control method based on visual awakening |
Also Published As
Publication number | Publication date |
---|---|
CN105719670A (en) | 2016-06-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP6420306B2 (en) | Speech end pointing | |
WO2017084197A1 (en) | Smart home control method and system based on emotion recognition | |
US11817094B2 (en) | Automatic speech recognition with filler model processing | |
US9818407B1 (en) | Distributed endpointing for speech recognition | |
CN111508474B (en) | Voice interruption method, electronic equipment and storage device | |
CN103811003B (en) | A kind of audio recognition method and electronic equipment | |
WO2021082572A1 (en) | Wake-up model generation method, smart terminal wake-up method, and devices | |
US11043222B1 (en) | Audio encryption | |
WO2020253073A1 (en) | Speech endpoint detection method, apparatus and device, and storage medium | |
CN109036393A (en) | Wake-up word training method, device and the household appliance of household appliance | |
CN108595406B (en) | User state reminding method and device, electronic equipment and storage medium | |
CN108039181A (en) | The emotion information analysis method and device of a kind of voice signal | |
CN109215647A (en) | Voice awakening method, electronic equipment and non-transient computer readable storage medium | |
CN103996399B (en) | Speech detection method and system | |
CN106531195B (en) | A kind of dialogue collision detection method and device | |
CN111091819A (en) | Voice recognition device and method, voice interaction system and method | |
CN107102713A (en) | It is a kind of to reduce the method and device of power consumption | |
CN118486297B (en) | Response method based on voice emotion recognition and intelligent voice assistant system | |
CN109360551A (en) | Voice recognition method and device | |
CN109074809B (en) | Information processing apparatus, information processing method, and computer-readable storage medium | |
US12062361B2 (en) | Wake word method to prolong the conversational state between human and a machine in edge devices | |
CN111933149A (en) | Voice interaction method, wearable device, terminal and voice interaction system | |
CN113903329B (en) | Voice processing method and device, electronic equipment and storage medium | |
CN112581937A (en) | Method and device for acquiring voice instruction | |
CN105719670B (en) | A kind of audio-frequency processing method and device towards intelligent robot |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |