CN107705785A - Sound localization method, intelligent sound box and the computer-readable medium of intelligent sound box - Google Patents
Sound localization method, intelligent sound box and the computer-readable medium of intelligent sound box Download PDFInfo
- Publication number
- CN107705785A CN107705785A CN201710647123.8A CN201710647123A CN107705785A CN 107705785 A CN107705785 A CN 107705785A CN 201710647123 A CN201710647123 A CN 201710647123A CN 107705785 A CN107705785 A CN 107705785A
- Authority
- CN
- China
- Prior art keywords
- voice signal
- sound source
- signal
- target sound
- word
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 76
- 230000004807 localization Effects 0.000 title claims abstract description 30
- 230000002618 waking effect Effects 0.000 claims abstract description 31
- 230000001755 vocal effect Effects 0.000 claims description 51
- 230000006870 function Effects 0.000 claims description 17
- 230000008569 process Effects 0.000 claims description 16
- 238000000605 extraction Methods 0.000 claims description 8
- 238000004590 computer program Methods 0.000 claims description 4
- 230000005540 biological transmission Effects 0.000 abstract description 2
- 238000004891 communication Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 5
- 230000005291 magnetic effect Effects 0.000 description 5
- 230000005236 sound signal Effects 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 4
- 230000003287 optical effect Effects 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 238000011161 development Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000002452 interceptive effect Effects 0.000 description 2
- 230000001133 acceleration Effects 0.000 description 1
- 230000000739 chaotic effect Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 239000003795 chemical substances by application Substances 0.000 description 1
- 230000001934 delay Effects 0.000 description 1
- 230000005059 dormancy Effects 0.000 description 1
- 230000005611 electricity Effects 0.000 description 1
- 239000000686 essence Substances 0.000 description 1
- 239000004744 fabric Substances 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003032 molecular docking Methods 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01S—RADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
- G01S5/00—Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations
- G01S5/18—Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations using ultrasonic, sonic, or infrasonic waves
- G01S5/20—Position of source determined by a plurality of spaced direction-finders
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/02—Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L2015/088—Word spotting
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Physics & Mathematics (AREA)
- Radar, Positioning & Navigation (AREA)
- Remote Sensing (AREA)
- Measurement Of Velocity Or Position Using Acoustic Or Ultrasonic Waves (AREA)
Abstract
The present invention provides a kind of sound localization method of intelligent sound box, intelligent sound box and computer-readable medium.Its method includes:If it is determined that when needing the voice signal that collection target sound source is sent, obtain at least two groups of signal receiving modules on intelligent sound box and the first voice signal for waking up word is preset to the carrying for receiving target sound source transmission;Two signal receiving modules for obtaining each group signal receiving module centering receive the time difference of the first voice signal;According to the first voice signal and each signal receiving module to receiving time difference of the first voice signal, it is determined that sending the orientation of the target sound source of the first voice signal.Technical scheme, target sound source can be positioned under the more scene of sound source, so, intelligent sound box can only gather the voice signal of the target sound source of orientation, and then provide service for user corresponding to the target sound source;But also it can effectively enrich the function of intelligent sound box so that the use of intelligent sound box is more flexibly, conveniently.
Description
【Technical field】
The present invention relates to Computer Applied Technology field, more particularly to a kind of sound localization method of intelligent sound box, intelligence
Audio amplifier and computer-readable medium.
【Background technology】
With the development of science and technology, smart machine steps into the family of user, intelligentized furniture environment is formed.Such as intelligence
Can audio amplifier be used as a kind of intelligent equipment in smart home, can help user look into music, look into weather, chat, talk with etc.,
Therefore intelligent sound box is needed with speech recognition, semantic parsing, content service, the generation of words art, voice TTS (TextToSpeech;
TTS the functions such as feedback) are reported.
In the prior art, intelligent sound box is both provided with the wake-up word of acquiescence, and intelligent sound box may be at stopping when not working
Dormancy state.When user needs intelligent sound box to start, can by way of voice calling intelligent audio amplifier wake-up word, intelligent sound
Case detects that the wake-up word of itself is waken up, and just launches into working condition.The speech polling (Query) of user's input is received,
Then speech recognition, semantic parsing, voice inquirement Query Query Result are carried out, feedback letter is then generated according to Query Result
Art if breath, and art if feedback information is subjected to TTS conversions, report feedback information to user by way of voice.
But in the prior art, intelligent sound box can not position to sound source, if multiple users around intelligent sound box are same
When calling intelligent audio amplifier when, multiple users can be caused intelligent sound box work chaotic, can not realize language equivalent to multi-acoustical
Sound Query inquiry.
【The content of the invention】
The invention provides a kind of sound localization method of intelligent sound box, intelligent sound box and computer-readable medium, it is used for
Realize positioning of the intelligent sound box to sound source.
The present invention provides a kind of sound localization method of intelligent sound box, and methods described includes:
If it is determined that when needing the voice signal that collection target sound source is sent, at least two groups of signals receptions on intelligent sound box are obtained
Module is to receiving default the first voice signal for waking up word of carrying that the target sound source is sent;The default wake-up word is used to supply
The target sound source wakes up the intelligent sound box;
Two signal receiving modules for obtaining signal receiving module centering described in each group receive the first voice letter
Number time difference;
According to first voice signal and each signal receiving module to receive first voice signal when
Between it is poor, it is determined that sending the orientation of the target sound source of first voice signal.
Still optionally further, in method as described above, received according to first voice signal and each signal
Module is to receiving time difference of first voice signal, it is determined that sending the side of the target sound source of first voice signal
After position, methods described also includes:
It is fixed to be lighted in rotational positioning cue mark to the orientation of the target sound source and/or to the orientation of the target sound source
Position indicator lamp, to inform user corresponding to the target sound source, the orientation of the target sound source has been determined.
Still optionally further, in method as described above, two letters of each signal receiving module centering are obtained
Number receiving module receives the time difference of first voice signal, specifically includes:
With first group of signal receiving module of at least two groups signal receiving module centerings to for object of reference, described in selection
The candidate direction θ of target sound source;
For each signal receiving module pair, according to the candidate direction θ of the target sound source, the letter corresponding to acquisition
Two signal receiving modules of number receiving module centering receive the time difference t0 of first voice signal, wherein the t0
For the function on the θ.
Still optionally further, in method as described above, received according to first voice signal and each signal
Module is to receiving time difference of first voice signal, it is determined that sending the side of the target sound source of first voice signal
Position, is specifically included:
Signal receiving module, will letter described in each group to the time difference t0 of reception first voice signal according to each group
First voice signal that two signal receiving modules of number receiving module centering receive is carried out at alignment in time
Reason;
Signal receiving module described in each group is calculated to the phase of first voice signal after corresponding two registration process
Guan Xing;
Signal receiving module described in each group is superimposed to the corresponding correlation, obtains overall relevancy;
It is the target sound to obtain the candidate direction θ corresponding to the target sound source when overall relevancy takes maximum
The target direction in source.
Still optionally further, in method as described above, according to each signal receiving module to receiving first language
The time difference t0 of sound signal, described first that two of each signal receiving module centering signal receiving modules are received
Voice signal carries out registration process in time, specifically includes:
By each signal receiving module to first received in two first voice signals of reception described first
Time difference t0 described in delay of speech signals, or two first voice signals by each signal receiving module to reception
In after first voice signal that receives shift to an earlier date the time difference t0, to cause two first voice signals in the time
Upper alignment.
Still optionally further, in method as described above, at least two groups of signal receiving module docking on intelligent sound box are obtained
Before receiving default the first voice signal for waking up word of carrying that the target sound source is sent, methods described also includes:
It is determined that need to gather the voice signal that the target sound source is sent;
Still optionally further, it is determined that needing to gather the voice signal that the target sound source is sent, specifically include:
Obtain carrying default first voice signal for waking up word that the target sound source is sent;
The default wake-up word is extracted from first voice signal;
Extract the vocal print feature of first voice signal;
According to it is pre-stored it is described it is default wake up word and the corresponding relation of the vocal print feature of the target sound source, described in judgement
It is default to wake up whether word matches with the vocal print feature of first voice signal;
If matching, it is determined that needing to gather the voice signal that the target sound source is sent.
Still optionally further, in method as described above, the carrying default wake-up that the target sound source is sent is obtained
Before first voice signal of word, methods described also includes:
Receive carrying default second voice signal for waking up word that user speech corresponding to the target sound source inputs;
The default wake-up word is extracted from second voice signal;
Extract the vocal print feature of second voice signal, the vocal print feature as the target sound source;
Establish and store the corresponding relation of the default vocal print feature for waking up word and the target sound source.
The present invention provides a kind of intelligent sound box, and the intelligent sound box includes:
Signal acquisition module, for if it is determined that when needing the voice signal that collection target sound source is sent, obtaining intelligent sound box
On at least two groups of signal receiving modules to receiving default the first voice signal for waking up word of carrying that the target sound source is sent;Institute
The default word that wakes up is stated to be used to wake up the intelligent sound box for the target sound source;
Time difference acquisition module, for obtaining two signal receiving modules of signal receiving module centering described in each group
Receive the time difference of first voice signal;
Locating module, for according to first voice signal and each signal receiving module to receiving described first
The time difference of voice signal, it is determined that sending the orientation of the target sound source of first voice signal.
Still optionally further, in intelligent sound box as described above, in addition to:
Indicating module is positioned, on rotational positioning cue mark to the orientation of the target sound source and/or to the mesh
The orientation of mark sound source lights positioning light, and to inform user corresponding to the target sound source, the orientation of the target sound source is
Through being determined.
Still optionally further, in intelligent sound box as described above, the time difference acquisition module, it is specifically used for:
With first group of signal receiving module of at least two groups signal receiving module centerings to for object of reference, described in selection
The candidate direction θ of target sound source;
For each signal receiving module pair, according to the candidate direction θ of the target sound source, the letter corresponding to acquisition
Two signal receiving modules of number receiving module centering receive the time difference t0 of first voice signal, wherein the t0
For the function on the θ.
Still optionally further, in intelligent sound box as described above, the locating module, it is specifically used for:
Signal receiving module, will letter described in each group to the time difference t0 of reception first voice signal according to each group
First voice signal that two signal receiving modules of number receiving module centering receive is carried out at alignment in time
Reason;
Signal receiving module described in each group is calculated to the phase of first voice signal after corresponding two registration process
Guan Xing;
Signal receiving module described in each group is superimposed to the corresponding correlation, obtains overall relevancy;
It is the target sound to obtain the candidate direction θ corresponding to the target sound source when overall relevancy takes maximum
The target direction in source.
Still optionally further, in intelligent sound box as described above, the locating module, specifically for each signal is connect
Module is received to the time difference described in first delay of speech signals that is first received in two first voice signals of reception
T0, or by each signal receiving module to first language that receives after in two first voice signals of reception
Sound signal shifts to an earlier date the time difference t0, to cause two first voice signals to align in time.
Still optionally further, in intelligent sound box as described above, the intelligent sound box also includes:
Determining module, the voice signal sent for determining to need to gather the target sound source;
Further, the determining module, is specifically used for:
Obtain carrying default first voice signal for waking up word that the target sound source is sent;
The default wake-up word is extracted from first voice signal;
Extract the vocal print feature of first voice signal;
According to it is pre-stored it is described it is default wake up word and the corresponding relation of the vocal print feature of the target sound source, described in judgement
It is default to wake up whether word matches with the vocal print feature of first voice signal;
If matching, it is determined that needing to gather the voice signal that the target sound source is sent.
Still optionally further, in intelligent sound box as described above, the intelligent sound box also includes:
Receiving module, the carrying default wake-up word inputted for receiving user speech corresponding to the target sound source
Second voice signal;
Extraction module, for extracting the default wake-up word from second voice signal;
The extraction module, it is additionally operable to extract the vocal print feature of second voice signal, as the target sound source
Vocal print feature;
Module is established, for establishing and storing the default wake-up word pass corresponding with the vocal print feature of the target sound source
System.
The present invention also provides a kind of intelligent sound box, including multiple microphones for receiving and transmitting signal;The intelligent sound box is also
Including:
One or more processors;
Memory, for storing one or more programs,
When one or more of programs are by one or more of computing devices so that one or more of processing
Device realizes the sound localization method of intelligent sound box as described above.
The present invention also provides a kind of computer-readable medium, is stored thereon with computer program, the program is held by processor
The sound localization method of intelligent sound box as described above is realized during row.
Sound localization method, intelligent sound box and the computer-readable medium of the intelligent sound box of the present invention, however, it is determined that needs are adopted
During the voice signal that collection target sound source is sent, by obtaining on intelligent sound box at least two groups of signal receiving modules to receiving target sound
Default the first voice signal for waking up word of carrying that source is sent;Two signals for obtaining each group signal receiving module centering receive mould
Block receives the time difference of the first voice signal;According to the first voice signal and each signal receiving module to receiving the first voice letter
Number time difference, it is determined that sending the orientation of the target sound source of the first voice signal.Technical scheme, can sound source compared with
Under more scenes, target sound source is positioned, so, intelligent sound box can only gather the voice of the target sound source of orientation
Signal, and then provide service for user corresponding to the target sound source;But also the function of intelligent sound box can be effectively enriched, make
Obtain the use of intelligent sound box more flexibly, conveniently.
【Brief description of the drawings】
Fig. 1 is the flow chart of the sound localization method embodiment one of the intelligent sound box of the present invention.
Fig. 2 is a kind of application scenario diagram of the sound localization method of the intelligent sound box of the present invention.
Fig. 3 is another application scenario diagram of the sound localization method of the intelligent sound box of the present invention.
Fig. 4 is the flow chart of the sound localization method embodiment two of the intelligent sound box of the present invention.
Fig. 5 is the structure chart of the intelligent sound box embodiment one of the present invention.
Fig. 6 is the structure chart of the intelligent sound box embodiment two of the present invention.
Fig. 7 is the structure chart of the intelligent sound box embodiment three of the present invention.
Fig. 8 is a kind of exemplary plot of intelligent sound box provided by the invention.
【Embodiment】
In order that the object, technical solutions and advantages of the present invention are clearer, below in conjunction with the accompanying drawings with specific embodiment pair
The present invention is described in detail.
Fig. 1 is the flow chart of the sound localization method embodiment one of the intelligent sound box of the present invention.As shown in figure 1, this implementation
The sound localization method of the intelligent sound box of example, specifically may include steps of:
100th, if it is determined that when needing the voice signal that collection target sound source is sent, at least two groups of signals on intelligent sound box are obtained
Receiving module is to receiving default the first voice signal for waking up word of carrying that target sound source is sent;
The executive agent of the sound localization method of the intelligent sound box of the present embodiment is intelligent sound box.The default of the present embodiment calls out
Word of waking up is used to wake up intelligent sound box for target sound source.The target sound source of the present embodiment is preferably the use interactive with intelligent sound box
Family.In order to receive the voice Query of user, and the voice Query based on user reports feedback information, this implementation to user
Signal receiving module and signal transmitting module are provided with the intelligent sound box of example.For example, the signal receiving module on intelligent sound box
It can become one with signal transmitting module, such as can be the microphone for being integrated in intelligent sound box, realize to from four sides eight
Side signal reception and feedback information is to all the winds reported.Alternatively, can be symmetrical on intelligent sound box in the present embodiment
Ground is provided with even number microphone, is such as uniformly arranged four microphones or four groups of microphones in the surrounding of intelligent sound box, every group
Two can be included.
, can be by multi-acoustical calling intelligent audio amplifier, because intelligent sound box can not be simultaneously in the usage scenario of the present embodiment
Multiple voice Query are handled, intelligent sound box passes through own analysis, it may be determined that need to gather the language that target sound source therein is sent
Sound signal, for example, it may be determined that need to gather the voice signal being initially received, or by other strategies from multi-acoustical
Target sound source therein is obtained, and determines to need to gather the voice signal that target sound source is sent.Now need further to target
Sound source is positioned, in the present embodiment, it is necessary first to obtains on intelligent sound box at least two groups of signal receiving modules to receiving target
Default the first voice signal for waking up word of carrying that sound source is sent, with by means of each group signal receiving module to receive first
Voice signal positions to target sound source.
The signal receiving module of the present embodiment is to that can be the microphone pair on intelligent sound box, and every group of microphone is to including two
Individual microphone, the two microphones are any two microphone on intelligent sound box.Because one group of microphone is to that may navigate to
Target sound source from symmetrical two orientation, cause to position not accurate enough.In the present embodiment, in order to target sound source
Orientation is positioned, it is necessary to select at least two groups of microphones to being positioned to the orientation of target sound source.Selected in the present embodiment
The relation at least two groups microphones pair selected is not limited.For example, one of which is two adjacent microphones, another pair can be with
To be adjacent, or two relative microphones on diagonal.Due to different microphone distance objective sound sources
Position it is different, then differ at the time of the same voice signal sent for the target sound source that different microphones receives.
In the present embodiment, however, it is determined that during the voice signal for needing collection target sound source to send, then need to obtain intelligent sound box
On at least two groups of signal receiving modules to receiving default the first voice signal for waking up word of carrying that target sound source is sent, for example,
If the default wake-up word of intelligent sound box is " great Bai ", it " great Bai, great Bai " voice signal, then can be target that user, which sends voice,
The first voice signal that sound source is sent.That is, in the present embodiment, can be according to mesh when being positioned to target sound source
The first voice signal that mark sound source wakes up intelligent sound box using default wake-up word is positioned, without gathering target sound source again
Other voice signals.
101st, two signal receiving modules for obtaining each group signal receiving module centering receive the time of the first voice signal
Difference;
If for example, signal receiving module to for microphone pair when, due to different microphone to target sound source distance not
It is identical, therefore, for each microphone centering two microphones receive target sound source the first voice signal existence time it is poor.
Alternatively, the step 101, can specifically include:With first group of signal receiving module of at least two groups signal receiving module centerings
Candidate direction θ to for object of reference, choosing target sound source;For each signal receiving module pair, according to the candidate side of target sound source
To θ, two signal receiving modules of signal receiving module centering corresponding to acquisition receive the time difference t0 of the first voice signal, its
Middle t0 is the function on θ.
Specifically, can first goal-selling sound source in the present embodiment because microphone is to can not accurately learn the time difference
Candidate direction θ, be then based on the candidate direction θ of target sound source, can represent that two microphones of each microphone centering receive
The time difference t0 of first voice signal of target sound source.Target sound source in the present embodiment is far field sound source, i.e., target sound source with
The distance between microphone is far longer than the distance between each microphone, now it is considered that the voice signal that target sound source is sent
It is that each microphone is transmitted in a manner of parallel lines.Can now select first group of microphone of at least two groups microphone centerings to for
Object of reference, it relative to the candidate direction of first group of microphone pair of the object of reference is θ to take target sound source.Due at least two groups of Mikes
The distance between two microphones of wind centering each group microphone centering are known, therefore according to the candidate direction of target sound source
The distance between two microphones of each group microphone centering, it can be received with identifying two microphones of each group microphone centering
The first voice signal time difference t0, now t0 is the function of the candidate direction θ on target sound source, and the time of target sound source
It can be that is, target sound source can be positioned in the space either one of 0-360 degree in space on 0-360 degree unspecified angles to select direction θ
On position.
A kind of such as application scenario diagram of the sound localization method for the intelligent sound box that Fig. 2 is the present invention.As shown in Fig. 2 its
Middle A, B and C are respectively the microphone on intelligent sound box, and A and B, A and C composition microphones pair, target sound can be taken in the present embodiment
The first voice signal that source is sent is transmitted in the form of parallel wave to each microphone.As shown in Fig. 2 using microphone to A and B as
Object of reference, the candidate direction that can take target sound source is θ, is then boost line BO perpendicular to the parallel wave direction of target sound source,
AO is that the first voice signal reaches microphone B with reaching microphone A range difference, AO distance, delta d=L × cos θ, wherein L
Equal to the distance between microphone A and B.Further, microphone A and B receives the first voice signal that target sound source is sent
Time difference t0 is equal to AO distance divided by velocity of sound V, and so, time difference t0 can be expressed as:T0=Δ d/V=L × cos θ/V, i.e.,
Time difference t0 is the function on θ.
Now, L is also equal to A and C, AC distance for another group of microphone in Fig. 2.On intelligent sound box, microphone
To the line of A and C line perpendicular to microphone to A and B, now received according to the geometrical relationship of triangle, microphone A and C
The time difference t0 for the first voice signal that target sound source is sent can be expressed as:T0=L × sin θ/V.
Such as another application scenario diagram of the sound localization method for the intelligent sound box that Fig. 3 is the present invention.With above-mentioned Fig. 2's
Processing mode is similar, and the time difference t0 that microphone A and B receive the first voice signal that target sound source is sent can be expressed as:t0
=Δ d/V=L × cos θ/V.And if the straight line where microphone A and C is parallel to the parallel wave of the first voice signal, now
Microphone A and C receive the length L ' that the length of the time difference t0 of the first voice signal that target sound source is sent equal to AC is in AC
Divided by velocity of sound V.
Above-mentioned Fig. 2 and Fig. 3 are only the citing of two kinds of special screnes, in practical application, for the intelligent sound under any scene
At least two groups of signal receiving modules pair of case, first group of signal receiving module therein can be always selected to for object of reference, obtaining
The candidate direction θ of target sound source, and can be according to the position relationship of each group signal receiving module pair on intelligent sound box, by each group
Two signal receiving modules of signal receiving module centering receive the time difference of the first voice signal using the candidate of target sound source
Direction θ shows.
102nd, according to the first voice signal and each signal receiving module to receiving time difference of the first voice signal, it is determined that
Send the orientation of the target sound source of the first voice signal.
When signal receiving module to for microphone pair when, for every group of microphone pair, the first language that two microphones receive
The time difference of sound signal can be expressed as the candidate direction θ of target sound source function, and candidate direction θ can choose 0-360 degree
In the range of any orientation angle.Furthermore, it is possible to the first voice signal that two microphones are received aligns in time,
So, two the first voice signals should have most strong correlation.Then by way of traveling through the angle in each orientation, obtain
Take the angle of two the first voice signal correlation maximums, just for target sound source orientation angle.
That is the step 102, specifically may include steps of:
(a1) the time difference t0 according to each group signal receiving module to the first voice signal of reception, by each signal receiving module
The first voice signal that two signal receiving modules of centering receive carries out registration process in time;
, can be by each signal receiving module in two the first voice signals of reception during registration process in the present embodiment
The the first delay of speech signals time difference t0 first received, or each signal receiving module is believed two the first voices of reception
The the first voice signal pre-set time difference t0 received after in number, to cause two the first voice signals to align in time.
(a2) correlation of each group signal receiving module to the first voice signal after corresponding two registration process is calculated;
(a3) each signal receiving module is superimposed to corresponding correlation, obtains overall relevancy;
(a4) target side that candidate direction θ corresponding to target sound source when overall relevancy takes maximum is target sound source is obtained
To.
If when it is determined that sending the orientation of the target sound source of the first voice signal, a pair of microphones pair are only selected, in Fig. 2
Microphone A and C, now, as shown in Fig. 2 may AC left side also exist one it is symmetrically standby on AC with target sound source
Select target sound source, the correlation of the first voice signal after registration process can also reach maximum, can not now uniquely determine
The orientation of target sound source.And if choosing microphone again to A and B, can be with unique true to determine the orientation of target sound source jointly
The orientation for the sound source that sets the goal.Therefore, it is necessary to obtain at least two groups of signal receiving modules to such as microphone pair, ability in the present embodiment
The orientation for the target sound source for sending the first voice signal can be uniquely determined.
Specifically, for each group of signal receiving module pair, two signals of this group of signal receiving module centering are received
The first voice signal that module receives aligns out, and registration process mode refers to above-mentioned record.For example, can be for certain microphone
To a and b, a receives the first voice signal Y2 after first receiving the first voice signal Y1, b, time difference t0, can receive a
The the first voice signal Y1 delay t0 arrived, or the first voice signal Y2 that can receive b shift to an earlier date t0.In the present embodiment,
If with the first delay of speech signals that will first receive, the first voice signal after now postponing is Y1 '.Then can calculate
Y1 ' and Y2 correlation.For each group of microphone pair, corresponding correlation can be obtained in the manner described above, then will
Each group microphone is added to corresponding correlation, just obtains overall relevancy;Because Y1 ' delays t0, t0 is the letter on θ again
Number, therefore, above-mentioned steps (a4), each candidate side corresponding to target sound source can be determined by way of traveling through each θ
To the value of overall relevancy corresponding to θ, it is target sound to obtain candidate direction θ corresponding to target sound source when overall relevancy takes maximum
The target direction in source.
If that is, selection target sound source candidate direction θ, be exactly just sound source true directions, then this
When, the first voice signal that two microphones receive is alignd it according to the candidate direction θ time differences calculated on a timeline
Afterwards, there should be most strong correlation between two paths of signals., whereas if selection candidate direction θ, be not sound source true side
To, then after the time difference alignment calculated according to candidate direction θ, between the first voice signal that two microphones receive
Correlation dies down.Therefore, can be by detecting the first voice signal of the alignment of two microphones corresponding to each candidate direction θ
Correlation, possibility of first voice signal from each candidate direction θ can be judged, correlation is stronger, illustrates that sound more has
It is probably incident from θ directions.
Still optionally further, step 102 is " according to the first voice signal and each signal receiving module to receiving the first voice
After the time difference of signal, it is determined that sending the orientation of the target sound source of the first voice signal ", it can also include:Rotational positioning refers to
Indicating, which is remembered in the orientation of target sound source and/or to the orientation of target sound source, lights positioning light, to inform target sound source pair
The user answered, the orientation of target sound source have been determined.
That is, intelligent sound box after being positioned to target sound source, it is necessary to which the user to the target sound source makes necessarily
Feedback, to inform that the target sound source that user sent to it positions, user can be interactive with the intelligent sound box, by intelligent sound
Case provides service.In the present embodiment, positioning cue mark can be provided with intelligent sound box, the positioning Warning Mark can be one
Individual rotatable pointer, after intelligent sound box positions to the target sound source, the positioning cue mark can be rotated to target sound
In the orientation in source, so, the user in this orientation can see its target sound source and be positioned.Or on the intelligent sound box also
Positioning light can be provided with, such as the positioning light can be arranged on the pointer of positioning cue mark, so, intelligence
After the orientation of audio amplifier positioning target sound source, positioning light can also be lighted to the orientation of target sound source, to inform target sound
User corresponding to source, the orientation of target sound source have been determined.Above two mode can be used alone, and can also combine makes
With.
The sound localization method of the intelligent sound box of the present embodiment, however, it is determined that need to gather the voice signal that target sound source is sent
When, preset wake-up word to receiving the carrying that target sound source is sent by obtaining at least two groups of signal receiving modules on intelligent sound box
First voice signal;Two signal receiving modules for obtaining each group signal receiving module centering receive the time of the first voice signal
Difference;According to the first voice signal and each signal receiving module to receiving time difference of the first voice signal, it is determined that sending first
The orientation of the target sound source of voice signal.The technical scheme of the present embodiment, can be under the more scene of sound source, to target sound source
Positioned, so, intelligent sound box can only gather the voice signal of the target sound source of orientation, and then be the target sound source
Corresponding user provides service;But also it can effectively enrich the function of intelligent sound box so that the use of intelligent sound box is more
Flexibly, conveniently.
Fig. 4 is the flow chart of the sound localization method embodiment two of the intelligent sound box of the present invention.As shown in figure 4, this implementation
The sound localization method of the intelligent sound box of example, on the basis of the technical scheme of above-described embodiment, is further situated between in further detail
Continue technical scheme.As shown in figure 4, the sound localization method of the intelligent sound box of the present embodiment, can also specifically include
Following technical scheme:
200th, default the second voice signal for waking up word of carrying of user speech input corresponding to target sound source is received;
201st, default wake-up word is extracted from the second voice signal;
202nd, the vocal print feature of the second voice signal, the vocal print feature as target sound source are extracted;
203rd, establish and store the corresponding relation of the default vocal print feature for waking up word and target sound source;
In the application scenarios of the sound localization method of the intelligent sound box of the present embodiment, the intelligent sound box can support increase to set
Put default wake-up word.For example, on the basis of the wake-up word of the intelligent sound box of prior art is immutable, in the present embodiment, intelligence
Can the owner of audio amplifier can be that oneself or family members set default wake-up word in intelligent sound box.For example, if intelligent sound box is silent
It is small A to recognize wake-up word, and owner is first people with intelligent sound box dialogue after intelligent sound box is bought, and the acquiescence wakes up
Word can be the privately owned wake-up word of the owner.Agree to by owner, owner can also allow other family members users on intelligent sound box
Its privately owned wake-up word is set.For example, in a comparable manner, when user corresponding to target sound source sets privately owned wake-up word, target
User corresponding to sound source can call " De-Lovely " with the second voice signal of the default wake-up word of phonetic entry carrying, such as the user,
The default wake-up word that the De-Lovely is set for the user of the target sound source to the intelligent sound box.Now intelligent sound box is from the second voice
Default wake-up word is extracted in signal;And extract the vocal print feature of the second voice signal, the vocal print feature as target sound source;Then
Establish and store the corresponding relation of the default vocal print feature for waking up word and target sound source.That is, call the default wake-up word
The vocal print feature of voice signal be necessary for vocal print feature in the corresponding relation, or the voice using vocal print feature calling
The wake-up word carried in signal is necessary for the default wake-up word in corresponding relation, and otherwise intelligent sound box can be ignored.
The process of the corresponding relation of above-mentioned foundation and the default vocal print feature for waking up word and target sound source of storage can be advance
Carry out, be easy to subsequently directly detect using the corresponding relation.
204th, default the first voice signal for waking up word of carrying that target sound source is sent is obtained;
205th, default wake-up word is extracted from the first voice signal;
206th, the vocal print feature of the first voice signal is extracted;
207th, according to the corresponding relation of the vocal print feature of pre-stored default wake-up word and target sound source, default wake up is judged
Whether word matches with the vocal print feature of the first voice signal;If matching, perform step 208;Otherwise, step 209 is performed;
208th, determine to need to gather the voice signal that target sound source is sent, terminate.
That is, the default wake-up word and the corresponding relation of the vocal print feature of target sound source that are obtained according to step 203, sentence
Whether the vocal print feature that the default wake-up word and step 206 that disconnected step 205 obtains extract matches.If matching, now determine to need
The voice signal that collection target sound source is sent, you can to perform the technical scheme of above-mentioned embodiment illustrated in fig. 1.
209th, determine that default wake-up word and vocal print feature mismatch, any operation wouldn't be performed.
Or alternatively, if intelligent sound box determines that current preset wakes up word and vocal print feature and mismatched, now intelligent sound box
Certain voice message " sorry, the wake-up word that you use is wrong, temporarily can not provide service to you " etc. can also be made
Similar prompting message.
The sound localization method of the intelligent sound box of the present embodiment, by using such scheme, it can also further use sound
Line and default wake-up word realize the determination to voice signal together.In the mode of the present embodiment, for same intelligent sound box, no
Same user can set different wake-up words, and the user can be waken up into word and the vocal print of the user in intelligent sound box
The corresponding relation storage of feature, so each user can only wake up the intelligent sound box using its privately owned wake-up word, and be handed over it
Mutually, so, intelligent sound box has good identification to the vocal print feature and wake-up word of each voice signal, can not only strengthen
The accuracy that intelligent sound box positions to voice signal, but also the using experience degree of user can be greatly enhanced.
Fig. 5 is the structure chart of the intelligent sound box embodiment one of the present invention.As shown in figure 5, the intelligent sound box of the present embodiment, tool
Body can include:Signal acquisition module 10, time difference acquisition module 11 and locating module 12.
Wherein signal acquisition module 10 is used to, if it is determined that when needing the voice signal that collection target sound source is sent, obtain intelligence
At least two groups of signal receiving modules are to receiving default the first voice signal for waking up word of carrying that target sound source is sent on audio amplifier;In advance
If wake up word to be used to wake up intelligent sound box for target sound source;
Time difference acquisition module 11 is used to obtain the two of each group signal receiving module centering of the acquisition of signal acquisition module 10
Individual signal receiving module receives the time difference of the first voice signal;
Locating module 12 is used for the first voice signal and time difference acquisition module obtained according to signal acquisition module 10
11 obtain each signal receiving modules to receive the first voice signal time difference, it is determined that sending the target sound of the first voice signal
The orientation in source.
The intelligent sound box of the present embodiment, the realization principle and technique effect of auditory localization are realized by using above-mentioned module
It is identical with realizing for above-mentioned related method embodiment, the record of above-mentioned related method embodiment is may be referred in detail, herein no longer
Repeat.
Fig. 6 is the structure chart of the intelligent sound box embodiment two of the present invention.As shown in fig. 6, the intelligent sound box of the present embodiment,
On the basis of the technical scheme of above-mentioned embodiment illustrated in fig. 5, it can also further include following technical scheme.
As shown in fig. 6, the intelligent sound box of the present embodiment, can also include positioning indicating module 13.
Positioning indicating module 13 is used for the orientation of the target sound source positioned in rotational positioning cue mark to locating module 12
The orientation of target sound source upper and/or to the positioning of locating module 12 lights positioning light, is used with informing corresponding to target sound source
Family, the orientation of target sound source have been determined.
Still optionally further, in the intelligent sound box of the present embodiment, time difference acquisition module 11 is specifically used for:
With first group of signal receiving module of at least two groups signal receiving module centerings to for object of reference, choosing target sound source
Candidate direction θ;
For each signal receiving module pair, according to the candidate direction θ of target sound source, signal receiving module pair corresponding to acquisition
In two signal receiving modules receive the time difference t0 of the first voice signal, wherein t0 is function on θ.
Still optionally further, in the intelligent sound box of the present embodiment, locating module 12 is specifically used for:
Time difference t0 according to each group signal receiving module to the first voice signal of reception, by each group signal receiving module pair
In two signal receiving modules receive the first voice signal carry out registration process in time;
Calculate correlation of each group signal receiving module to the first voice signal after corresponding two registration process;
Each group signal receiving module is superimposed to corresponding correlation, obtains overall relevancy;
Obtain the target direction that candidate direction θ corresponding to target sound source when overall relevancy takes maximum is target sound source.
Still optionally further, in the intelligent sound box of the present embodiment, locating module 12 is specifically used for each signal receiving module
Received to the first delay of speech signals time difference t0 first received in two the first voice signals of reception, or by each signal
Module is to the first voice signal pre-set time difference t0 for receiving after in two the first voice signals of reception, to cause two
One voice signal aligns in time.
Still optionally further, as shown in fig. 6, the intelligent sound box of the present embodiment also includes:
Determining module 14 is used to determine to need to gather the voice signal that target sound source is sent.
Still optionally further, in the intelligent sound box of the present embodiment, determining module 14 is specifically used for:
Obtain default the first voice signal for waking up word of carrying that target sound source is sent;
Default wake-up word is extracted from the first voice signal;
Extract the vocal print feature of the first voice signal;
According to the corresponding relation of the vocal print feature of pre-stored default wake-up word and target sound source, judge it is default wake up word with
Whether the vocal print feature of the first voice signal matches;
If matching, it is determined that needing to gather the voice signal that target sound source is sent.
Accordingly, when it is determined that needing the voice signal that collection target sound source is sent, trigger signal obtains determining module 14
At least two groups of signal receiving modules preset wake-up word to receiving the carrying that target sound source is sent on the acquisition intelligent sound box of module 10
First voice signal.
Still optionally further, as shown in fig. 6, the intelligent sound box of the present embodiment also includes:
Receiving module 15 is used to receive default the second language for waking up word of carrying that user speech corresponding to target sound source inputs
Sound signal;
Default wake-up word is extracted in the second voice signal that extraction module 16 is used to receive from receiving module 15;
Extraction module 16 is additionally operable to extract the vocal print feature for the second voice signal that receiving module 15 receives, as target sound
The vocal print feature in source;
Establish the vocal print spy that module 17 is used for the default wake-up word and target sound source for establishing and storing the extraction of extraction module 16
The corresponding relation of sign.
Accordingly, determining module 14 is used for according to the pre-stored default wake-up word and target sound source for establishing the foundation of module 17
Vocal print feature corresponding relation, judge default to wake up whether word matches with the vocal print feature of the first voice signal.
The intelligent sound box of the present embodiment, the realization principle and technique effect of auditory localization are realized by using above-mentioned module
It is identical with realizing for above-mentioned related method embodiment, the record of above-mentioned related method embodiment is may be referred in detail, herein no longer
Repeat.
Fig. 7 is the structure chart of the intelligent sound box embodiment three of the present invention.As shown in fig. 7, the intelligent sound box of the present embodiment, bag
Include multiple microphone (not shown)s for receiving and transmitting signal.For example, multiple microphones of the intelligent sound box can uniformly divide
For cloth on the housing of intelligent sound box, multiple microphones are used for the voice Query for receiving user, are additionally operable to the voice based on user
Query reports feedback information to user.The intelligent sound box of the present embodiment also includes:One or more processors 30, and storage
Device 40, memory 40 be used for store one or more programs, when the one or more programs stored in memory 40 by one or
Multiple processors 30 are performed so that one or more processors 30 are realized such as the intelligent sound box of figure 1 above-embodiment illustrated in fig. 4
Sound localization method.In embodiment illustrated in fig. 7 exemplified by including multiple processors 30.
For example, Fig. 8 is a kind of exemplary plot of intelligent sound box provided by the invention.Fig. 8 is shown suitable for being used for realizing this hair
The exemplary intelligent sound box 12a of bright embodiment block diagram.The intelligent sound box 12a that Fig. 8 is shown is only an example, should not be right
The function and use range of the embodiment of the present invention bring any restrictions.
As shown in figure 8, the intelligent sound box 12a of the present embodiment is showed in the form of universal computing device.Intelligent sound box 12a's
Component can include but is not limited to:One or more processor 16a, system storage 28a, connection different system component (bag
Include system storage 28a and processor 16a) bus 18a.
Bus 18a represents the one or more in a few class bus structures, including memory bus or Memory Controller,
Peripheral bus, graphics acceleration port, processor or the local bus using any bus structures in a variety of bus structures.Lift
For example, these architectures include but is not limited to industry standard architecture (ISA) bus, MCA (MAC)
Bus, enhanced isa bus, VESA's (VESA) local bus and periphery component interconnection (PCI) bus.
Intelligent sound box 12a typically comprises various computing systems computer-readable recording medium.These media can be it is any can be by
The usable medium that intelligent sound box 12a is accessed, including volatibility and non-volatile media, moveable and immovable medium.
System storage 28a can include the computer system readable media of form of volatile memory, such as deposit at random
Access to memory (RAM) 30a and/or cache memory 32a.Intelligent sound box 12a may further include it is other it is removable/no
Movably, volatile/non-volatile computer system storage medium.Only as an example, storage system 34a can be used for reading and writing
Immovable, non-volatile magnetic media (Fig. 8 is not shown, is commonly referred to as " hard disk drive ").Although not shown in Fig. 8, can
To provide the disc driver being used for may move non-volatile magnetic disk (such as " floppy disk ") read-write, and to removable non-volatile
Property CD (such as CD-ROM, DVD-ROM or other optical mediums) read-write CD drive.In these cases, it is each to drive
Dynamic device can be connected by one or more data media interfaces with bus 18a.System storage 28a can include at least one
Individual program product, the program product have one group of (for example, at least one) program module, and these program modules are configured to perform
The function of the above-mentioned each embodiments of Fig. 1-Fig. 6 of the present invention.
Program with one group of (at least one) program module 42a/utility 40a, such as system can be stored in and deposited
In reservoir 28a, such program module 42a include --- but being not limited to --- operating system, one or more application program,
Other program modules and routine data, the reality of network environment may be included in each or certain combination in these examples
It is existing.Program module 42a generally performs the function and/or method in above-mentioned each embodiments of Fig. 1-Fig. 6 described in the invention.
Intelligent sound box 12a can also be with one or more external equipment 14a (such as keyboard, sensing equipment, display 24a
Deng) communication, the equipment communication interacted with intelligent sound box 12a can be also enabled a user to one or more, and/or with causing
Any equipment that intelligent sound box 12a can be communicated with one or more of the other computing device (such as network interface card, modem
Etc.) communication.This communication can be carried out by input/output (I/O) interface 22a.Also, intelligent sound box 12a can also lead to
Cross network adapter 20a and one or more network (such as LAN (LAN), wide area network (WAN) and/or public network, example
Such as internet) communication.As illustrated, network adapter 20a is communicated by bus 18a with intelligent sound box 12a other modules.Should
When understanding, although not shown in the drawings, other hardware and/or software module can be used with reference to intelligent sound box 12a, including it is but unlimited
In:Microcode, device driver, redundant processor, external disk drive array, RAID system, tape drive and data
Backup storage system etc..
Processor 16a is stored in program in system storage 28a by operation, so as to perform various function application and
Data processing, such as realize the sound localization method of the intelligent sound box shown in above-described embodiment.
The present invention also provides a kind of computer-readable medium, is stored thereon with computer program, the program is held by processor
The sound localization method of the intelligent sound box as shown in above-mentioned embodiment is realized during row.
The computer-readable medium of the present embodiment can be included in the system storage 28a in above-mentioned embodiment illustrated in fig. 8
RAM30a, and/or cache memory 32a, and/or storage system 34a.
With the development of science and technology, the route of transmission of computer program is no longer limited by tangible medium, can also be directly from net
Network is downloaded, or is obtained using other modes.Therefore, the computer-readable medium in the present embodiment can not only include tangible
Medium, invisible medium can also be included.
The computer-readable medium of the present embodiment can use any combination of one or more computer-readable media.
Computer-readable medium can be computer-readable signal media or computer-readable recording medium.Computer-readable storage medium
Matter for example may be-but not limited to-system, device or the device of electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor, or
Combination more than person is any.The more specifically example (non exhaustive list) of computer-readable recording medium includes:With one
Or the electrical connections of multiple wires, portable computer diskette, hard disk, random access memory (RAM), read-only storage (ROM),
Erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disc read-only storage (CD-ROM), light
Memory device, magnetic memory device or above-mentioned any appropriate combination.In this document, computer-readable recording medium can
Be it is any include or the tangible medium of storage program, the program can be commanded execution system, device or device use or
Person is in connection.
Computer-readable signal media can include in a base band or as carrier wave a part propagation data-signal,
Wherein carry computer-readable program code.The data-signal of this propagation can take various forms, including --- but
It is not limited to --- electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be
Any computer-readable medium beyond computer-readable recording medium, the computer-readable medium can send, propagate or
Transmit for by instruction execution system, device either device use or program in connection.
The program code included on computer-readable medium can be transmitted with any appropriate medium, including --- but it is unlimited
In --- wireless, electric wire, optical cable, RF etc., or above-mentioned any appropriate combination.
It can be write with one or more programming languages or its combination for performing the computer that operates of the present invention
Program code, described program design language include object oriented program language-such as Java, Smalltalk, C++,
Also include conventional procedural programming language-such as " C " language or similar programming language.Program code can be with
Fully perform, partly perform on the user computer on the user computer, the software kit independent as one performs, portion
Divide and partly perform or performed completely on remote computer or server on the remote computer on the user computer.
Be related in the situation of remote computer, remote computer can pass through the network of any kind --- including LAN (LAN) or
Wide area network (WAN)-be connected to subscriber computer, or, it may be connected to outer computer (such as carried using Internet service
Pass through Internet connection for business).
In several embodiments provided by the present invention, it should be understood that disclosed system, apparatus and method can be with
Realize by another way.For example, device embodiment described above is only schematical, for example, the unit
Division, only a kind of division of logic function, can there is other dividing mode when actually realizing.
The unit illustrated as separating component can be or may not be physically separate, show as unit
The part shown can be or may not be physical location, you can with positioned at a place, or can also be distributed to multiple
On NE.Some or all of unit therein can be selected to realize the mesh of this embodiment scheme according to the actual needs
's.
In addition, each functional unit in each embodiment of the present invention can be integrated in a processing unit, can also
That unit is individually physically present, can also two or more units it is integrated in a unit.Above-mentioned integrated list
Member can both be realized in the form of hardware, can also be realized in the form of hardware adds SFU software functional unit.
The above-mentioned integrated unit realized in the form of SFU software functional unit, can be stored in one and computer-readable deposit
In storage media.Above-mentioned SFU software functional unit is stored in a storage medium, including some instructions are causing a computer
It is each that equipment (can be personal computer, server, or network equipment etc.) or processor (processor) perform the present invention
The part steps of embodiment methods described.And foregoing storage medium includes:USB flash disk, mobile hard disk, read-only storage (Read-
Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disc or CD etc. it is various
Can be with the medium of store program codes.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all essences in the present invention
God any modification, equivalent substitution and improvements done etc., should be included within the scope of protection of the invention with principle.
Claims (16)
1. a kind of sound localization method of intelligent sound box, it is characterised in that methods described includes:
If it is determined that when needing the voice signal that collection target sound source is sent, at least two groups of signal receiving modules on intelligent sound box are obtained
Default the first voice signal for waking up word of carrying sent to receiving the target sound source;The default wake-up word is used for described in confession
Target sound source wakes up the intelligent sound box;
Two signal receiving modules for obtaining signal receiving module centering described in each group receive first voice signal
Time difference;
According to first voice signal and each signal receiving module to receiving time difference of first voice signal,
It is determined that send the orientation of the target sound source of first voice signal.
2. according to the method for claim 1, it is characterised in that connect according to first voice signal and each signal
Module is received to receiving the time difference of first voice signal, it is determined that send the target sound source of first voice signal
After orientation, methods described also includes:
In rotational positioning cue mark to the orientation of the target sound source and/or positioning is lighted to the orientation of the target sound source to refer to
Show lamp, to inform user corresponding to the target sound source, the orientation of the target sound source has been determined.
3. according to the method for claim 1, it is characterised in that described in obtain each signal receiving module centering two
Signal receiving module receives the time difference of first voice signal, specifically includes:
With first group of signal receiving module of at least two groups signal receiving module centerings to for object of reference, choosing the target
The candidate direction θ of sound source;
For each signal receiving module pair, according to the candidate direction θ of the target sound source, the signal corresponding to acquisition connects
Two signal receiving modules for receiving module centering receive the time difference t0 of first voice signal, wherein the t0 is pass
In the function of the θ.
4. according to the method for claim 3, it is characterised in that connect according to first voice signal and each signal
Module is received to receiving the time difference of first voice signal, it is determined that send the target sound source of first voice signal
Orientation, specifically include:
Signal receiving module connects signal described in each group to the time difference t0 of reception first voice signal according to each group
First voice signal for receiving two signal receiving modules receptions of module centering carries out registration process in time;
Signal receiving module described in each group is calculated to the correlation of first voice signal after corresponding two registration process;
Signal receiving module described in each group is superimposed to the corresponding correlation, obtains overall relevancy;
It is the target sound source to obtain the candidate direction θ corresponding to the target sound source when overall relevancy takes maximum
Target direction.
5. according to the method for claim 4, it is characterised in that according to each signal receiving module to receiving described first
The time difference t0 of voice signal, described that two of each signal receiving module centering signal receiving modules are received
One voice signal carries out registration process in time, specifically includes:
By each signal receiving module to first voice that is first received in two first voice signals of reception
Time difference t0 described in signal delay, or by each signal receiving module to after in two first voice signals of reception
First voice signal received shifts to an earlier date the time difference t0, to cause two first voice signals right in time
Together.
6. according to the method for claim 1, it is characterised in that obtain at least two groups of signal receiving modules pair on intelligent sound box
Before receiving default the first voice signal for waking up word of carrying that the target sound source is sent, methods described also includes:
It is determined that need to gather the voice signal that the target sound source is sent;
Further, it is determined that needing to gather the voice signal that the target sound source is sent, specifically include:
Obtain carrying default first voice signal for waking up word that the target sound source is sent;
The default wake-up word is extracted from first voice signal;
Extract the vocal print feature of first voice signal;
According to the corresponding relation of the pre-stored default vocal print feature for waking up word and the target sound source, judge described default
Wake up whether word matches with the vocal print feature of first voice signal;
If matching, it is determined that needing to gather the voice signal that the target sound source is sent.
7. according to the method for claim 6, it is characterised in that obtain described preset of carrying that the target sound source is sent and call out
Wake up before first voice signal of word, methods described also includes:
Receive carrying default second voice signal for waking up word that user speech corresponding to the target sound source inputs;
The default wake-up word is extracted from second voice signal;
Extract the vocal print feature of second voice signal, the vocal print feature as the target sound source;
Establish and store the corresponding relation of the default vocal print feature for waking up word and the target sound source.
8. a kind of intelligent sound box, it is characterised in that the intelligent sound box includes:
Signal acquisition module, for if it is determined that when needing the voice signal that collection target sound source is sent, obtaining intelligent sound box up to
Few two groups of signal receiving modules are to receiving default the first voice signal for waking up word of carrying that the target sound source is sent;It is described pre-
If wake up word to be used to wake up the intelligent sound box for the target sound source;
Time difference acquisition module, two signal receiving modules for obtaining signal receiving module centering described in each group receive
The time difference of first voice signal;
Locating module, for according to first voice signal and each signal receiving module to receiving first voice
The time difference of signal, it is determined that sending the orientation of the target sound source of first voice signal.
9. intelligent sound box according to claim 8, it is characterised in that also include:
Indicating module is positioned, on rotational positioning cue mark to the orientation of the target sound source and/or to the target sound
The orientation in source lights positioning light, to inform user corresponding to the target sound source, the orientation of the target sound source by
It is determined that.
10. intelligent sound box according to claim 8, it is characterised in that the time difference acquisition module, be specifically used for:
With first group of signal receiving module of at least two groups signal receiving module centerings to for object of reference, choosing the target
The candidate direction θ of sound source;
For each signal receiving module pair, according to the candidate direction θ of the target sound source, the signal corresponding to acquisition connects
Two signal receiving modules for receiving module centering receive the time difference t0 of first voice signal, wherein the t0 is pass
In the function of the θ.
11. intelligent sound box according to claim 10, it is characterised in that the locating module, be specifically used for:
Signal receiving module connects signal described in each group to the time difference t0 of reception first voice signal according to each group
First voice signal for receiving two signal receiving modules receptions of module centering carries out registration process in time;
Signal receiving module described in each group is calculated to the correlation of first voice signal after corresponding two registration process;
Signal receiving module described in each group is superimposed to the corresponding correlation, obtains overall relevancy;
It is the target sound source to obtain the candidate direction θ corresponding to the target sound source when overall relevancy takes maximum
Target direction.
12. intelligent sound box according to claim 11, it is characterised in that the locating module, specifically for will be each described
Signal receiving module is to described in first delay of speech signals that is first received in two first voice signals of reception
Time difference t0, or by each signal receiving module to described in receiving after in two first voice signals of reception
First voice signal shifts to an earlier date the time difference t0, to cause two first voice signals to align in time.
13. intelligent sound box according to claim 8, it is characterised in that the intelligent sound box also includes:
Determining module, the voice signal sent for determining to need to gather the target sound source;
Further, the determining module, is specifically used for:
Obtain carrying default first voice signal for waking up word that the target sound source is sent;
The default wake-up word is extracted from first voice signal;
Extract the vocal print feature of first voice signal;
According to the corresponding relation of the pre-stored default vocal print feature for waking up word and the target sound source, judge described default
Wake up whether word matches with the vocal print feature of first voice signal;
If matching, it is determined that needing to gather the voice signal that the target sound source is sent.
14. intelligent sound box according to claim 13, it is characterised in that the intelligent sound box also includes:
Receiving module, the second of the carrying default wake-up word inputted for receiving user speech corresponding to the target sound source
Voice signal;
Extraction module, for extracting the default wake-up word from second voice signal;
The extraction module, it is additionally operable to extract the vocal print feature of second voice signal, the vocal print as the target sound source
Feature;
Module is established, for establishing and storing the corresponding relation of the default vocal print feature for waking up word and the target sound source.
15. a kind of intelligent sound box, including multiple microphones for receiving and transmitting signal;Characterized in that, the intelligent sound box also wraps
Include:
One or more processors;
Memory, for storing one or more programs,
When one or more of programs are by one or more of computing devices so that one or more of processors are real
The now method as described in any in claim 1-7.
16. a kind of computer-readable medium, is stored thereon with computer program, it is characterised in that the program is executed by processor
Methods of the Shi Shixian as described in any in claim 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710647123.8A CN107705785A (en) | 2017-08-01 | 2017-08-01 | Sound localization method, intelligent sound box and the computer-readable medium of intelligent sound box |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710647123.8A CN107705785A (en) | 2017-08-01 | 2017-08-01 | Sound localization method, intelligent sound box and the computer-readable medium of intelligent sound box |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107705785A true CN107705785A (en) | 2018-02-16 |
Family
ID=61170119
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710647123.8A Pending CN107705785A (en) | 2017-08-01 | 2017-08-01 | Sound localization method, intelligent sound box and the computer-readable medium of intelligent sound box |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107705785A (en) |
Cited By (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108133704A (en) * | 2018-02-22 | 2018-06-08 | 成都启英泰伦科技有限公司 | A kind of sound source locking system |
CN108364642A (en) * | 2018-02-22 | 2018-08-03 | 成都启英泰伦科技有限公司 | A kind of sound source locking means |
CN108445451A (en) * | 2018-05-11 | 2018-08-24 | 四川斐讯信息技术有限公司 | A kind of intelligent sound box and its sound localization method |
CN108762104A (en) * | 2018-05-17 | 2018-11-06 | 江西午诺科技有限公司 | Speaker control method, device, readable storage medium storing program for executing and mobile terminal |
CN108831471A (en) * | 2018-09-03 | 2018-11-16 | 与德科技有限公司 | A kind of voice method for security protection, device and route terminal |
CN108962263A (en) * | 2018-06-04 | 2018-12-07 | 百度在线网络技术(北京)有限公司 | A kind of smart machine control method and system |
CN108966077A (en) * | 2018-06-19 | 2018-12-07 | 四川斐讯信息技术有限公司 | A kind of control method and system of speaker volume |
CN108966067A (en) * | 2018-06-07 | 2018-12-07 | Oppo广东移动通信有限公司 | Control method for playing back and Related product |
CN110232916A (en) * | 2019-05-10 | 2019-09-13 | 平安科技(深圳)有限公司 | Method of speech processing, device, computer equipment and storage medium |
CN110364161A (en) * | 2019-08-22 | 2019-10-22 | 北京小米智能科技有限公司 | Method, electronic equipment, medium and the system of voice responsive signal |
CN110794368A (en) * | 2019-10-28 | 2020-02-14 | 星络智能科技有限公司 | Sound source positioning method and device, intelligent sound box and storage medium |
CN111412587A (en) * | 2020-03-31 | 2020-07-14 | 广东美的制冷设备有限公司 | Voice processing method and device of air conditioner, air conditioner and storage medium |
CN111541813A (en) * | 2020-04-09 | 2020-08-14 | 北京金茂绿建科技有限公司 | Audio playing method, electronic equipment and computer readable storage medium |
CN111833862A (en) * | 2019-04-19 | 2020-10-27 | 佛山市顺德区美的电热电器制造有限公司 | Control method of equipment, control equipment and storage medium |
CN112104810A (en) * | 2020-07-28 | 2020-12-18 | 苏州触达信息技术有限公司 | Panoramic photographing apparatus, panoramic photographing method, and computer-readable storage medium |
CN112104686A (en) * | 2020-04-27 | 2020-12-18 | 苏州触达信息技术有限公司 | Intelligent equipment and file transmission method between intelligent equipment |
CN112104928A (en) * | 2020-05-13 | 2020-12-18 | 苏州触达信息技术有限公司 | Intelligent sound box and method and system for controlling intelligent sound box |
CN112201241A (en) * | 2020-09-28 | 2021-01-08 | 适居之家科技有限公司 | Intelligent voice bedside cabinet, voice processing method thereof and voice control system |
CN112346016A (en) * | 2020-10-28 | 2021-02-09 | 苏州触达信息技术有限公司 | Underwater personnel positioning method and wearable equipment |
CN112908322A (en) * | 2020-12-31 | 2021-06-04 | 思必驰科技股份有限公司 | Voice control method and device for toy vehicle |
CN113747092A (en) * | 2020-05-29 | 2021-12-03 | 深圳Tcl数字技术有限公司 | Sound channel playing method, system and storage medium |
WO2024000853A1 (en) * | 2022-06-28 | 2024-01-04 | 歌尔科技有限公司 | Wearable device control method and apparatus, terminal device, and storage medium |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104934033A (en) * | 2015-04-21 | 2015-09-23 | 深圳市锐曼智能装备有限公司 | Control method of robot sound source positioning and awakening identification and control system of robot sound source positioning and awakening identification |
CN105467364A (en) * | 2015-11-20 | 2016-04-06 | 百度在线网络技术(北京)有限公司 | Method and apparatus for localizing target sound source |
CN205412218U (en) * | 2015-12-19 | 2016-08-03 | 榆林学院 | Music robot |
CN205789081U (en) * | 2016-06-30 | 2016-12-07 | 广东职业技术学院 | A kind of singing robot |
CN106292732A (en) * | 2015-06-10 | 2017-01-04 | 上海元趣信息技术有限公司 | Intelligent robot rotating method based on sound localization and Face datection |
US20170094223A1 (en) * | 2015-09-24 | 2017-03-30 | Cisco Technology, Inc. | Attenuation of Loudspeaker in Microphone Array |
CN106601245A (en) * | 2016-12-15 | 2017-04-26 | 北京塞宾科技有限公司 | Vehicle-mounted intelligent audio device and audio processing method |
CN106815507A (en) * | 2015-11-30 | 2017-06-09 | 中兴通讯股份有限公司 | Voice wakes up implementation method, device and terminal |
-
2017
- 2017-08-01 CN CN201710647123.8A patent/CN107705785A/en active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104934033A (en) * | 2015-04-21 | 2015-09-23 | 深圳市锐曼智能装备有限公司 | Control method of robot sound source positioning and awakening identification and control system of robot sound source positioning and awakening identification |
CN106292732A (en) * | 2015-06-10 | 2017-01-04 | 上海元趣信息技术有限公司 | Intelligent robot rotating method based on sound localization and Face datection |
US20170094223A1 (en) * | 2015-09-24 | 2017-03-30 | Cisco Technology, Inc. | Attenuation of Loudspeaker in Microphone Array |
CN105467364A (en) * | 2015-11-20 | 2016-04-06 | 百度在线网络技术(北京)有限公司 | Method and apparatus for localizing target sound source |
CN106815507A (en) * | 2015-11-30 | 2017-06-09 | 中兴通讯股份有限公司 | Voice wakes up implementation method, device and terminal |
CN205412218U (en) * | 2015-12-19 | 2016-08-03 | 榆林学院 | Music robot |
CN205789081U (en) * | 2016-06-30 | 2016-12-07 | 广东职业技术学院 | A kind of singing robot |
CN106601245A (en) * | 2016-12-15 | 2017-04-26 | 北京塞宾科技有限公司 | Vehicle-mounted intelligent audio device and audio processing method |
Cited By (29)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108364642A (en) * | 2018-02-22 | 2018-08-03 | 成都启英泰伦科技有限公司 | A kind of sound source locking means |
CN108133704A (en) * | 2018-02-22 | 2018-06-08 | 成都启英泰伦科技有限公司 | A kind of sound source locking system |
CN108445451A (en) * | 2018-05-11 | 2018-08-24 | 四川斐讯信息技术有限公司 | A kind of intelligent sound box and its sound localization method |
CN108762104A (en) * | 2018-05-17 | 2018-11-06 | 江西午诺科技有限公司 | Speaker control method, device, readable storage medium storing program for executing and mobile terminal |
CN108962263B (en) * | 2018-06-04 | 2019-09-20 | 百度在线网络技术(北京)有限公司 | A kind of smart machine control method and system |
CN108962263A (en) * | 2018-06-04 | 2018-12-07 | 百度在线网络技术(北京)有限公司 | A kind of smart machine control method and system |
CN108966067A (en) * | 2018-06-07 | 2018-12-07 | Oppo广东移动通信有限公司 | Control method for playing back and Related product |
CN108966077A (en) * | 2018-06-19 | 2018-12-07 | 四川斐讯信息技术有限公司 | A kind of control method and system of speaker volume |
CN108831471B (en) * | 2018-09-03 | 2020-10-23 | 重庆与展微电子有限公司 | Voice safety protection method and device and routing terminal |
CN108831471A (en) * | 2018-09-03 | 2018-11-16 | 与德科技有限公司 | A kind of voice method for security protection, device and route terminal |
CN111833862B (en) * | 2019-04-19 | 2023-10-20 | 佛山市顺德区美的电热电器制造有限公司 | Control method of equipment, control equipment and storage medium |
CN111833862A (en) * | 2019-04-19 | 2020-10-27 | 佛山市顺德区美的电热电器制造有限公司 | Control method of equipment, control equipment and storage medium |
CN110232916A (en) * | 2019-05-10 | 2019-09-13 | 平安科技(深圳)有限公司 | Method of speech processing, device, computer equipment and storage medium |
US11295740B2 (en) | 2019-08-22 | 2022-04-05 | Beijing Xiaomi Intelligent Technology Co., Ltd. | Voice signal response method, electronic device, storage medium and system |
CN110364161A (en) * | 2019-08-22 | 2019-10-22 | 北京小米智能科技有限公司 | Method, electronic equipment, medium and the system of voice responsive signal |
CN110794368B (en) * | 2019-10-28 | 2021-10-19 | 星络智能科技有限公司 | Sound source positioning method and device, intelligent sound box and storage medium |
CN110794368A (en) * | 2019-10-28 | 2020-02-14 | 星络智能科技有限公司 | Sound source positioning method and device, intelligent sound box and storage medium |
CN111412587A (en) * | 2020-03-31 | 2020-07-14 | 广东美的制冷设备有限公司 | Voice processing method and device of air conditioner, air conditioner and storage medium |
CN111541813A (en) * | 2020-04-09 | 2020-08-14 | 北京金茂绿建科技有限公司 | Audio playing method, electronic equipment and computer readable storage medium |
CN112104686B (en) * | 2020-04-27 | 2024-05-17 | 苏州触达信息技术有限公司 | Intelligent device and file transmission method between intelligent devices |
CN112104686A (en) * | 2020-04-27 | 2020-12-18 | 苏州触达信息技术有限公司 | Intelligent equipment and file transmission method between intelligent equipment |
CN112104928A (en) * | 2020-05-13 | 2020-12-18 | 苏州触达信息技术有限公司 | Intelligent sound box and method and system for controlling intelligent sound box |
CN113747092A (en) * | 2020-05-29 | 2021-12-03 | 深圳Tcl数字技术有限公司 | Sound channel playing method, system and storage medium |
CN112104810A (en) * | 2020-07-28 | 2020-12-18 | 苏州触达信息技术有限公司 | Panoramic photographing apparatus, panoramic photographing method, and computer-readable storage medium |
CN112201241A (en) * | 2020-09-28 | 2021-01-08 | 适居之家科技有限公司 | Intelligent voice bedside cabinet, voice processing method thereof and voice control system |
CN112346016A (en) * | 2020-10-28 | 2021-02-09 | 苏州触达信息技术有限公司 | Underwater personnel positioning method and wearable equipment |
CN112346016B (en) * | 2020-10-28 | 2023-11-28 | 苏州触达信息技术有限公司 | Positioning method for personnel in water and wearable equipment |
CN112908322A (en) * | 2020-12-31 | 2021-06-04 | 思必驰科技股份有限公司 | Voice control method and device for toy vehicle |
WO2024000853A1 (en) * | 2022-06-28 | 2024-01-04 | 歌尔科技有限公司 | Wearable device control method and apparatus, terminal device, and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107705785A (en) | Sound localization method, intelligent sound box and the computer-readable medium of intelligent sound box | |
US10706851B2 (en) | Server side hotwording | |
AU2013252518B2 (en) | Embedded system for construction of small footprint speech recognition with user-definable constraints | |
CN108962240A (en) | A kind of sound control method and system based on earphone | |
US20170162198A1 (en) | Method and apparatus for executing voice command in electronic device | |
CN109243428B (en) | A kind of method that establishing speech recognition modeling, audio recognition method and system | |
CN109036396A (en) | A kind of exchange method and system of third-party application | |
CN108520743A (en) | Sound control method, smart machine and the computer-readable medium of smart machine | |
EP3526789B1 (en) | Voice capabilities for portable audio device | |
CN106210239A (en) | The maliciously automatic identifying method of caller's vocal print, device and mobile terminal | |
CN107886944A (en) | A kind of audio recognition method, device, equipment and storage medium | |
CN108133707A (en) | A kind of content share method and system | |
JP2008547061A (en) | Context-sensitive communication and translation methods to enhance interaction and understanding between different language speakers | |
CN112735418B (en) | Voice interaction processing method, device, terminal and storage medium | |
CN102413100A (en) | Voice-print authentication system having voice-print password picture prompting function and realization method thereof | |
CN105719659A (en) | Recording file separation method and device based on voiceprint identification | |
CN104581221A (en) | Video live broadcasting method and device | |
CN107220532A (en) | For the method and apparatus by voice recognition user identity | |
US20090216525A1 (en) | System and method for treating homonyms in a speech recognition system | |
CN102413101A (en) | Voice-print authentication system having voice-print password voice prompting function and realization method thereof | |
CN113129867B (en) | Training method of voice recognition model, voice recognition method, device and equipment | |
CN109785846A (en) | The role recognition method and device of the voice data of monophonic | |
CN106341539A (en) | Automatic evidence obtaining method of malicious caller voiceprint, apparatus and mobile terminal thereof | |
CN107943724A (en) | Method and device for searching external device, terminal device and storage medium | |
CN108831449A (en) | A kind of data interaction system method and system based on intelligent sound box |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180216 |
|
RJ01 | Rejection of invention patent application after publication |