CN107820037A - The methods, devices and systems of audio signal, image procossing - Google Patents
The methods, devices and systems of audio signal, image procossing Download PDFInfo
- Publication number
- CN107820037A CN107820037A CN201610826122.5A CN201610826122A CN107820037A CN 107820037 A CN107820037 A CN 107820037A CN 201610826122 A CN201610826122 A CN 201610826122A CN 107820037 A CN107820037 A CN 107820037A
- Authority
- CN
- China
- Prior art keywords
- angle
- detected
- microphone array
- audio signal
- calculated
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000005236 sound signal Effects 0.000 title claims abstract description 90
- 238000000034 method Methods 0.000 title claims abstract description 68
- 238000012545 processing Methods 0.000 claims description 28
- 238000001914 filtration Methods 0.000 claims description 14
- 238000003491 array Methods 0.000 claims description 7
- 239000000203 mixture Substances 0.000 claims description 6
- 238000012937 correction Methods 0.000 claims description 5
- 241000406668 Loxodonta cyclotis Species 0.000 claims description 3
- 241000209140 Triticum Species 0.000 claims description 3
- 235000021307 Triticum Nutrition 0.000 claims description 3
- 238000001514 detection method Methods 0.000 abstract description 21
- 238000005516 engineering process Methods 0.000 abstract description 15
- 230000000694 effects Effects 0.000 abstract description 6
- 101100345605 Rattus norvegicus Mill2 gene Proteins 0.000 description 22
- 230000006870 function Effects 0.000 description 13
- 239000010445 mica Substances 0.000 description 13
- 229910052618 mica group Inorganic materials 0.000 description 13
- 238000010586 diagram Methods 0.000 description 12
- 230000008859 change Effects 0.000 description 3
- 230000001747 exhibiting effect Effects 0.000 description 3
- 239000004568 cement Substances 0.000 description 2
- 230000002708 enhancing effect Effects 0.000 description 2
- 230000001815 facial effect Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 230000001629 suppression Effects 0.000 description 2
- 230000001154 acute effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000004807 localization Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/14—Systems for two-way working
- H04N7/15—Conference systems
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01S—RADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
- G01S5/00—Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations
- G01S5/18—Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations using ultrasonic, sonic, or infrasonic waves
- G01S5/22—Position of source determined by co-ordinating a plurality of position lines defined by path-difference measurements
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/57—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for processing of video signals
Landscapes
- Engineering & Computer Science (AREA)
- Signal Processing (AREA)
- Multimedia (AREA)
- Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Acoustics & Sound (AREA)
- General Physics & Mathematics (AREA)
- Radar, Positioning & Navigation (AREA)
- Remote Sensing (AREA)
- Circuit For Audible Band Transducer (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
Abstract
The invention provides a kind of audio signal, the methods, devices and systems of image procossing, are calculated by the present invention according to the first preset algorithm according to the audio signal that multiple Mikes gather, obtain the first predicted position of object to be detected;According to the second preset algorithm treat detection object historical position be filtered after calculate, obtain the second predicted position of object to be detected;It is corrected with reference to the continuity of the first predicted position and the second predicted position according to audio signal in time, obtain the position that object to be detected is currently located, solve due to lacking the position tracking technology to spokesman, the problem of causing position and the tracking acquisition spokesman's multimedia messages that can not show spokesman in time in net meeting system, reach the position for obtaining spokesman in time and tracking obtains spokesman's multimedia messages effect.
Description
Technical field
The present invention relates to speech recognition technology application field, in particular to a kind of audio signal, the side of image procossing
Method, device and system.
Background technology
With the fast development of video communication technology, teleconference television services increasingly rise.In Remote Video Conference system
During the use of system, how the sound of foundation spokesman is positioned and shown by equipment, is become now long-range
The problem of to be solved in video conferencing system.
For, due to lacking the position tracking technology to spokesman, causing in correlation technique in net meeting system
The problem of position of spokesman and tracking obtain spokesman's multimedia messages can not be shown in time, not yet proposed at present effective
Solution.
The content of the invention
The embodiments of the invention provide a kind of audio signal, the methods, devices and systems of image procossing, at least to solve phase
Due to lacking the position tracking technology to spokesman in the technology of pass, cause that hair can not be shown in time in net meeting system
The problem of position of speaker and tracking obtain spokesman's multimedia messages.
According to one embodiment of present invention, there is provided a kind of method of Audio Signal Processing, including:It is default according to first
Algorithm is calculated according to the audio signal that multiple Mikes gather, and obtains the first predicted position of object to be detected;According to second
Preset algorithm treat detection object historical position be filtered after calculate, obtain the second predicted position of object to be detected;Knot
Close the continuity of the first predicted position and the second predicted position according to audio signal in time to be corrected, it is to be detected right to obtain
As the position being currently located.
Optionally, calculated, obtained to be detected according to the audio signal that multiple Mikes gather according to the first preset algorithm
First predicted position of object includes:Multiple Mikes are classified, are divided into the first microphone array and the second microphone array;According to
The first angle between object and the first microphone array to be detected is calculated according to the first preset algorithm, and according to the first preset algorithm
Calculate the second angle between object to be detected and the second microphone array;According to default trigonometric function, pass through the first angle and the
Two angles, the first predicted position of object to be detected is calculated.
Further, optionally, the between object to be detected and the first microphone array is calculated according to the first preset algorithm
One angle includes:In the case where the first preset algorithm is arrival time difference algorithm TDOA, calculate each in the first microphone array
Euclidean distance between the audio signal of Mike's collection;According to the Euclidean distance and the between the audio signal of each Mike collection
The relation of one angle is calculated, and obtains the estimation value set of the first angle;The average of the estimation value set of the first angle is calculated,
And average is defined as the first angle.
Optionally, the second angle bag between object and the second microphone array to be detected is calculated according to the first preset algorithm
Include:In the case where the first preset algorithm is arrival time difference algorithm TDOA, each Mike's collection in the second microphone array is calculated
Audio signal between Euclidean distance;According to the Euclidean distance and second angle between the audio signal of each Mike collection
Relation is calculated, and obtains the estimation value set of the second angle;The average of the estimation value set of the second angle is calculated, and by average
It is defined as the second angle.
Optionally, according to the second preset algorithm treat detection object historical position be filtered after calculate, obtain to be checked
Surveying the second predicted position of object includes:Calculate the first pre- measuring angle of the first microphone array respectively by the first preset algorithm
First estimation value set, and the second estimation value set of the second pre- measuring angle of the second microphone array;In the second preset algorithm
In the case of for Kalman filtering algorithm, the first estimation value set and the second estimate are judged respectively by Kalman filtering algorithm
Whether set meets preparatory condition;The first angle and the second angle are determined according to judged result;According to default trigonometric function, pass through
First angle and the second angle are calculated, and obtain the second predicted position of object to be detected.
Further, optionally, after the position that object to be detected is currently located is obtained, method also includes:Foundation is treated
The position that detection object is currently located, update Kalman filter parameter.
Further, optionally, after the position that object to be detected is currently located is obtained, method also includes:Enhancing is treated
The voice output of detection object.
According to another embodiment of the invention, there is provided a kind of method of image procossing, including:By presetting Mike's battle array
Row obtain the first depth value of the image capture device of the first microphone array and display device, and the second microphone array and display
Second depth value of the image capture device of equipment;The first microphone array and IMAQ corresponding to the first depth value are calculated respectively
The first kind angle of equipment, and calculate the second class of the second microphone array and image capture device folder corresponding to the second depth value
Angle;According to the first depth value, the second depth value, first kind angle and the second class angle structure hyperspace coordinate system;Acquisition is treated
The position of detection object, and determine position of the object to be detected in hyperspace coordinate system according to hyperspace coordinate system.
Optionally, the first kind of the first microphone array and image capture device corresponding to the first depth value is calculated respectively to press from both sides
Angle, and the second class angle of the second microphone array and image capture device corresponding to the second depth value of calculating include:According to the
One depth and the second depth and the preparatory condition of actual range, calculate first kind angle and the second class angle.
According to still another embodiment of the invention, there is provided a kind of device of Audio Signal Processing, including:First calculates mould
Block, for being calculated according to the first preset algorithm according to the audio signal that multiple Mikes gather, obtain the of object to be detected
One predicted position;Second computing module, after the historical position for treating detection object according to the second preset algorithm is filtered
Calculate, obtain the second predicted position of object to be detected;Correction module, for combining the first predicted position and the second predicted position
It is corrected according to the continuity of audio signal in time, obtains the position that object to be detected is currently located.
According to still a further embodiment, there is provided a kind of device of image procossing, including:By presetting Mike's battle array
Row obtain the first depth value of the image capture device of the first microphone array and display device, and the second microphone array and display
Second depth value of the image capture device of equipment;Computing module, for calculating the first Mike corresponding to the first depth value respectively
The first kind angle of array and image capture device, and calculate the second microphone array and IMAQ corresponding to the second depth value
Second class angle of equipment;Coordinate space module, for according to the first depth value, the second depth value, first kind angle and second
Class angle builds hyperspace coordinate system;Acquisition module, for obtaining the position of object to be detected, and according to hyperspace coordinate
System determines position of the object to be detected in hyperspace coordinate system.
According to one embodiment of present invention, there is provided a kind of voice, the system of image procossing, including:Video conference is whole
End, image capture device, depth image collecting device, the sound acquisition module and display device of multiple microphone arrays composition, its
In, the sound acquisition module of multiple microphone array compositions, for gathering the audio signal of object to be detected;Image capture device,
For gathering all video images in meeting-place;Depth image collecting device, for gathering the depth image in meeting-place, depth image
For obtaining the positional information between participant and depth image collecting device;Video conference terminal, for tracking participant's
Position, displaying participant speech when image and carry out minutes.
According to still another embodiment of the invention, a kind of storage medium is additionally provided.The storage medium is arranged to storage and used
In the program code for performing following steps:Calculated according to the first preset algorithm according to the audio signal that multiple Mikes gather,
Obtain the first predicted position of object to be detected;According to the second preset algorithm treat detection object historical position be filtered after
Calculate, obtain the second predicted position of object to be detected;With reference to the first predicted position and the second predicted position according to audio signal
Continuity in time is corrected, and obtains the position that object to be detected is currently located.
Alternatively, storage medium is also configured to the program code that storage is used to perform following steps:According to the first pre- imputation
Method is calculated according to the audio signal that multiple Mikes gather, and is obtained the first predicted position of object to be detected and is included:Will be multiple
Mike is classified, and is divided into the first microphone array and the second microphone array;According to the first preset algorithm calculate object to be detected with
The first angle between first microphone array, and according to the first preset algorithm calculate object to be detected and the second microphone array it
Between the second angle;According to default trigonometric function, by the first angle and the second angle, the first of object to be detected is calculated
Predicted position.
Further, alternatively, storage medium is also configured to the program code that storage is used to perform following steps:According to the
The first angle that one preset algorithm is calculated between object and the first microphone array to be detected includes:It is arrival in the first preset algorithm
In the case of time difference algorithm TDOA, calculate in the first microphone array between the audio signal of each Mike collection it is European away from
From;Relation according to Euclidean distance and the first angle between the audio signal of each Mike collection is calculated, and obtains first
The estimation value set of angle;The average of the estimation value set of the first angle is calculated, and average is defined as the first angle.
Alternatively, storage medium is also configured to the program code that storage is used to perform following steps:According to the first imputation in advance
The second angle that method is calculated between object and the second microphone array to be detected includes:It is reaching time-difference in the first preset algorithm
In the case of algorithm TDOA, the Euclidean distance between the audio signal of each Mike's collection in the second microphone array is calculated;Foundation
The relation of Euclidean distance and the second angle between the audio signal of each Mike's collection is calculated, and obtains estimating for the second angle
Evaluation set;The average of the estimation value set of the second angle is calculated, and average is defined as the second angle.
Alternatively, storage medium is also configured to the program code that storage is used to perform following steps:According to the second pre- imputation
Method treat detection object historical position be filtered after calculate, obtaining the second predicted position of object to be detected includes:Pass through
First preset algorithm calculates the first estimation value set of the first pre- measuring angle of the first microphone array, and second Mike's battle array respectively
Second estimation value set of the second pre- measuring angle of row;In the case where the second preset algorithm is Kalman filtering algorithm, pass through
Kalman filtering algorithm judges whether the first estimation value set and the second estimation value set meet preparatory condition respectively;According to judgement
As a result the first angle and the second angle are determined;The first angle and the second angle are calculated according to default trigonometric function, is obtained to be detected
Second predicted position of object.
Further, optionally, storage medium is also configured to the program code that storage is used to perform following steps:Obtaining
After the position that object to be detected is currently located, method also includes:The position being currently located according to object to be detected, update karr
Graceful filter parameter.
Further, alternatively, storage medium is also configured to the program code that storage is used to perform following steps:Obtaining
After the position that object to be detected is currently located, method also includes:Strengthen the voice output of object to be detected.
By the present invention, due to being calculated according to the first preset algorithm according to the audio signal that multiple Mikes gather, obtain
To the first predicted position of object to be detected;According to the second preset algorithm treat detection object historical position be filtered after count
Calculate, obtain the second predicted position of object to be detected;Exist with reference to the first predicted position and the second predicted position according to audio signal
Temporal continuity is corrected, and obtains the position that object to be detected is currently located.Therefore, can solve due to lacking to hair
The position tracking technology of speaker, cause not showing that the position of spokesman and tracking obtain in time in net meeting system
The problem of taking spokesman's multimedia messages, reaches the position for obtaining spokesman in time and tracking obtains spokesman's multimedia messages
Effect.
Brief description of the drawings
Accompanying drawing described herein is used for providing a further understanding of the present invention, forms the part of the application, this hair
Bright schematic description and description is used to explain the present invention, does not form inappropriate limitation of the present invention.In the accompanying drawings:
Fig. 1 is the flow chart of the method for Audio Signal Processing according to embodiments of the present invention;
Fig. 2 be Audio Signal Processing according to embodiments of the present invention method in two microphone arrays closed with speaker position
It is schematic diagram;
Fig. 3 be Audio Signal Processing according to embodiments of the present invention method in speaker with respect to microphone array position calculate
Schematic diagram;
Fig. 4 be Audio Signal Processing according to embodiments of the present invention method in TDOA algorithm schematic diagrames;
Fig. 5 is to combine more miaow heads in the method for Audio Signal Processing according to embodiments of the present invention to position original to TDOA algorithms
Reason figure;
Fig. 6 is the flow chart of the method for image procossing according to embodiments of the present invention;
Fig. 7 is the method system device layout figure of image procossing according to embodiments of the present invention;
Fig. 8 is to utilize microphone array to measure TV apart from principle in the method for image procossing according to embodiments of the present invention
Figure;
Fig. 9 be image procossing according to embodiments of the present invention method according to depth information calculate depth camera depth
The angle schematic diagram of axle and microphone array line;
Figure 10 is the structural representation of the device of Audio Signal Processing according to embodiments of the present invention;
Figure 11 is the structural representation of the device of image procossing according to embodiments of the present invention;
Figure 12 is audio signal according to embodiments of the present invention, the structural representation of the system of image procossing
Figure 13 is the corresponding word methods of exhibiting schematic diagram of voice interested.
Embodiment
Describe the present invention in detail below with reference to accompanying drawing and in conjunction with the embodiments.It should be noted that do not conflicting
In the case of, the feature in embodiment and embodiment in the application can be mutually combined.
It should be noted that term " first " in description and claims of this specification and above-mentioned accompanying drawing, "
Two " etc. be for distinguishing similar object, without for describing specific order or precedence.
The invention relates to technical term:
TDOA:Arrival time difference algorithm, Time Difference of Arrival.
Embodiment 1
Fig. 1 is the flow chart of the method for Audio Signal Processing according to embodiments of the present invention, as shown in figure 1, the flow bag
Include following steps:
Step S102, calculated, obtained to be checked according to the audio signal that multiple Mikes gather according to the first preset algorithm
Survey the first predicted position of object;
Step S104, according to the second preset algorithm treat detection object historical position be filtered after calculate, treated
Second predicted position of detection object;
Step S106, enter with reference to the continuity of the first predicted position and the second predicted position according to audio signal in time
Row correction, obtains the position that object to be detected is currently located.
By above-mentioned steps, due to being calculated according to the first preset algorithm according to the audio signal that multiple Mikes gather,
Obtain the first predicted position of object to be detected;According to the second preset algorithm treat detection object historical position be filtered after
Calculate, obtain the second predicted position of object to be detected;With reference to the first predicted position and the second predicted position according to audio signal
Continuity in time is corrected, and obtains the position that object to be detected is currently located.Therefore, can solve due to lacking pair
The position tracking technology of spokesman, cause position and the tracking that can not show spokesman in time in net meeting system
The problem of obtaining spokesman's multimedia messages, reaches the position for obtaining spokesman in time and tracking obtains spokesman's multimedia letter
Cease effect.
The method for the Audio Signal Processing that the embodiment of the present application provides goes for audio source tracking location technology, wherein,
Auditory localization technology has very high application prospect and use value, can such as be used for detecting the position of speaker, and automatically will
Video image focuses on speaker so that hearer preferably observes spokesman, it might even be possible to discovers the trickle facial table of spokesman
Feelings, so as to which be listens stronger presence, it is better understood from and experiences the content to be expressed of spokesman.
Optionally, calculated in step S102 according to the first preset algorithm according to the audio signal that multiple Mikes gather,
Obtaining the first predicted position of object to be detected includes:
Step1, multiple Mikes are classified, be divided into the first microphone array and the second microphone array;
Step2, the first angle between object and the first microphone array to be detected is calculated according to the first preset algorithm, and
The second angle between object and the second microphone array to be detected is calculated according to the first preset algorithm;
Step3, according to default trigonometric function, by the first angle and the second angle, it is calculated the of object to be detected
One predicted position.
Further, optionally, in Step2 according to the first preset algorithm calculate object to be detected and the first microphone array it
Between the first angle include:
In the case where the first preset algorithm is arrival time difference algorithm TDOA, each Mike in the first microphone array is calculated
Euclidean distance between the audio signal of collection;According to the Euclidean distance between the audio signal of each Mike collection and the first folder
The relation at angle is calculated, and obtains the estimation value set of the first angle;The average of the estimation value set of the first angle is calculated, and will
Average is defined as the first angle.
Optionally, second between object and the second microphone array to be detected is calculated in Step2 according to the first preset algorithm
Angle includes:In the case where the first preset algorithm is arrival time difference algorithm TDOA, each wheat in the second microphone array is calculated
Gram collection audio signal between Euclidean distance;According to the Euclidean distance and second between the audio signal of each Mike collection
The relation of angle is calculated, and obtains the estimation value set of the second angle;The average of the estimation value set of the second angle is calculated, and
Average is defined as the second angle.
Optionally, in step S104 according to the second preset algorithm treat detection object historical position be filtered after count
Calculate, obtaining the second predicted position of object to be detected includes:
Calculate the first estimation value set of the first pre- measuring angle of the first microphone array respectively by the first preset algorithm,
And second microphone array the second pre- measuring angle second estimation value set;It is Kalman filtering algorithm in the second preset algorithm
In the case of, judge whether the first estimation value set and the second estimation value set meet to preset respectively by Kalman filtering algorithm
Condition;The first angle and the second angle are determined according to judged result;According to default trigonometric function, pressed from both sides by the first angle and second
Angle is calculated, and obtains the second predicted position of object to be detected.
Further, optionally, after obtaining the position that object to be detected is currently located in step s 106, the present invention is real
Applying the method for the Audio Signal Processing of example offer also includes:The position being currently located according to object to be detected, renewal Kalman's filter
Ripple device parameter.
Further, optionally, after obtaining the position that object to be detected is currently located in step s 106, the present invention is real
Applying the method for the Audio Signal Processing of example offer also includes:Strengthen the voice output of object to be detected.
To sum up, the method for Audio Signal Processing provided in an embodiment of the present invention is specific as follows:
Fig. 2 be Audio Signal Processing according to embodiments of the present invention method in two microphone arrays closed with speaker position
It is schematic diagram, as illustrated in fig. 2, it is assumed that some meeting-place shares two microphone arrays MicA and MicB, each microphone array respectively has four
Individual Mike, microphone array MicA/MicB is regarded as the set of Mike, i.e. MicA={ MicA0, MicA1, MicA2, MicA3 },
And MicB={ MicB0, MicB1, MicB2, MicB3 }.During general video conference in some period of some meeting-place only
One people's speech, therefore we first assume that some meeting-place is spoken in t, one-man, speaker is with respect to MicA and MicB
Position relationship it is as shown in Figure 2.
Now the angle of speaker and MicA and MicB are not zero, it is assumed that the angle between speaker and MicA is θ0, say
The angle talked about between people and MicB is θ1, due to the distance between MicA and MicB, it is known that according to triangle theorem, it is easy to predict
Obtain the position of speaker, as shown in figure 3, Fig. 3 be Audio Signal Processing according to embodiments of the present invention method in speaker
Schematic diagram is calculated with respect to microphone array position.
Speaker and MicA/MicB angle theta0And θ1It can be drawn according to Time Delay Estimation Algorithms such as TDOA, as shown in figure 4,
Fig. 4 be Audio Signal Processing according to embodiments of the present invention method in TDOA algorithm schematic diagrames.
Assuming that voice spread speed is fixed as γ, sound source and MicA0/MicA1 angle theta0(between MicA0 and MicA1
Line is parallel with MicB line with MicA), MicA0 and MicA1 spacing are l0, due to sound source and MicA0 and MicA1 distance
Difference, variant from sound source arrival MicA1 and MicA0 time, the time difference is Δ t:
Δ t=l0cosθ0/γ
Above-mentioned difference is embodied on miaow head MicA0 and MicA1, is exactly that MicA0 exists compared to the voice sequence of MicA1 samplings
Time delay, it is assumed that MicA0 and MicA1 sample rate is that S, in addition MicA0 and MicA1 maximum delay are no more than l0/γ.Herein
Voice sequence X={ the x that MicA0 is sampled under constraints0,x1,x2,…,xnWith MicA1 sampling voice sequence Y={ y0, y1,
y2,…,yn, X is in μ ∈ |-S*l0/γ,S*l0/ γ | between skew obtain X '={ x0+μ,x1+μ,x2+μ,…,xn+μ, X ' and Y it
Between Euclidean distance be:
Wherein δ | μ ∈ [- S*l0/γ,S*l0/ γ] there is minimum value δmin, δminCorresponding skew μ | δmin, according to μ | δminCan
Speaker and MicA0 and] angle theta between MicA1 between line0:
MicA has four miaow heads { MicA0, MicA1, MicA2, MicA3 }, shares 6 miaow heads to { MicA0, MicA1 },
{ MicA1, MicA2 }, { MicA2, MicA3 }, { MicA0, MicA2 }, { MicA1, MicA3 }, as shown in Figure 4.6 miaow heads pair can
To obtain one group of estimate { θ to speaker direction0,0, θ0,1, θ0,2, θ0,3, θ0,4, θ0,5, by their averagePrediction result as speaker directionAllow the deviation for there are 5 ° by experimental verification.Fig. 5 is according to this hair
Combine more miaow heads in the method for the Audio Signal Processing of bright embodiment to TDOA algorithm positioning schematics.
θ is obtained using same algorithm to MicB four miaow heads { MicB0, MicB1, MicB2, MicB3 }1Prediction knot
FruitThe prediction result of speaker position is obtained by simple trigonometric function operation by Fig. 3
By above-mentioned algorithm, within a period of time, a series of prediction results of speaker position can be obtainedBut because noise etc. disturbs, the prediction result that above-mentioned algorithm obtains is accurate not enough
Really, therefore we are devised based on Kalman prediction tracking speaker position, the constraint as the prediction of speaker's deflection
Condition, improve and combine the accuracy that more miaow heads position to TDOA algorithms.
Step 1:Pass through the position of Kalman prediction current time speakerAnd be converted into relative MicA and
The prediction of the angle of MicB linesWith
Step 2:To each Mike, calculate the time delay of each of which miaow head pair using TDOA algorithms and be converted into relative MicA
With the angle of MicB lines, the estimate in one group of speaker direction is obtained:{θi,0, θi,1, θi,2, θi,3, θi,4, θi,5};
Step 3:IfThink that the prediction result deviation of Kalman filtering is too big, it is necessary to give up
Abandon, directly withPrediction result as current time speaker directionOtherwise it is assumed that Kalman filtering is pre-
Surveying result can receive, willEstimate exclude, i.e. U θ={ θ 'i,0,θ‘i,1,…,θ‘i,n-1,1<=n<=6,1<=j<=n, then willPrediction as current time speaker direction
As a result
Step 4:Step 2 and step 3 are carried out to two microphone arrays, obtain the pre- of current time speaker direction
Survey resultWithAnd speaker position is obtained according to simple trigonometric function operationAnd Kalman filter parameter is entered
Row renewal.
Embodiment 2
Fig. 6 is the flow chart of the method for image procossing according to embodiments of the present invention, as shown in fig. 6, the flow is included such as
Lower step:
Step S602, obtain the image capture device of the first microphone array and display device by presetting microphone array the
One depth value, and the second depth value of the image capture device of the second microphone array and display device;
Step S604, the first kind of the first microphone array and image capture device corresponding to the first depth value is calculated respectively and is pressed from both sides
Angle, and calculate the second class angle of the second microphone array and image capture device corresponding to the second depth value;
Step S606, according to the first depth value, the second depth value, first kind angle and the second class angle structure hyperspace
Coordinate system;
Step S608, the position of object to be detected is obtained, and determine object to be detected more according to hyperspace coordinate system
Position in dimension space coordinate system.
By above-mentioned steps, due to obtaining the IMAQ of the first microphone array and display device by presetting microphone array
First depth value of equipment, and the second depth value of the image capture device of the second microphone array and display device;Count respectively
The first kind angle of the first microphone array corresponding to the first depth value and image capture device is calculated, and calculates the second depth value pair
The second microphone array and the second class angle of image capture device answered;According to the first depth value, the second depth value, first kind folder
Angle and the second class angle structure hyperspace coordinate system;The position of object to be detected is obtained, and it is true according to hyperspace coordinate system
Fixed position of the object to be detected in hyperspace coordinate system.Therefore, can solve due to lacking the position tracking to spokesman
Technology, cause position and the tracking acquisition spokesman's multimedia that can not show spokesman in time in net meeting system
The problem of information, reach the position for obtaining spokesman in time and tracking obtains spokesman's multimedia messages effect.
Optionally, the first microphone array corresponding to the first depth value and image capture device are calculated in step S604 respectively
First kind angle, and calculate the second class angle bag of the second microphone array and image capture device corresponding to the second depth value
Include:According to the first depth and the second depth and the preparatory condition of actual range, first kind angle and the second class angle are calculated.
To sum up, the method for the image procossing that the embodiment of the present application provides is specific as follows:
System requirements microphone array, depth camera, image pickup head, the relative position of TV are fixed, the institute of below figure 7
Show, Fig. 7 is the method system device layout figure of image procossing according to embodiments of the present invention.
MicA and MicB spacing in system, it is known that generally 2~3 meters of spacing, television set width can be surveyed, and television set with
The keep level of line between MicA and MicB.The distance between line is unknown between TV and MicA, MicB, according to meeting room area
Place.When system is installed for the first time, video conference device controls one section of voice prerecorded of televising, and passes through above-mentioned joint
More miaow heads estimate TDOA algorithms position (including direction and distance) of the TV with respect to MicA and MicB, as shown in figure 8, Fig. 8 is
TV is measured apart from schematic diagram using microphone array in the method for image procossing according to embodiments of the present invention.
Because microphone array has special shape and color, microphone array can be identified in image pickup head, entered
And corresponding depth information being drawn in depth camera, it is assumed that MicA depth is Depth0, and MicB depth is Depth1,
Angle of the camera with respect to MicA and MicB can be calculated using trigonometric function.
Fig. 9 be image procossing according to embodiments of the present invention method according to depth information calculate depth camera depth
The angle schematic diagram of axle and microphone array line, as shown in figure 9, due to microphone array positioning be relative microphone array direction and
Position, and direction and position that depth camera positioning is then relative depth camera are utilized, two kinds of letters are utilized in system
Breath realizes accurate positioning speaker, it is also necessary to changes coordinate system.The microphone array used in the system can only navigate to two
Dimension space position, corresponding is two reference axis in left and right and depth in depth camera, i.e. x-axis and z-axis.Assuming that in Mike
In array two-dimensional space, MicA coordinate is (0,0), and MicB coordinate is (length, 0), and wherein length is microphone array
Spacing.After microphone array positions, the coordinate of depth camera (with TV in same position) is (x, y), with respect to MicA's
Direction is θ0, the direction with respect to MicB is θ1.And microphone array MicA depth is depth0 in depth camera, MicB depth
Spend for depth1.According to above- mentioned information, although depth information and being not equal to actual range, meet with actual range:
Y=f (depth)
Wherein y0And y1It is actual range of the MicA and MicB relative depths camera in depth direction.According to trigonometric function
It can obtain:
I.e.:
From triangle geometric knowledge, θ 2 and θ 3 meet again:
θ2+θ3=θ0+θ1
Finally it can be calculated:
θ2=(θ0+θ1)-θ3
Pay attention to only analyzing θ 2 and θ 3 here and θ 0, θ 1 be acute angle situation, other situations are similar.According to above-mentioned side
The two-dimensional spatial location for the relative microphone array that method can navigate to microphone array is transformed into the three dimensions of depth camera
In left and right and depth shaft position.
During user's use, camera angle (microphone array position is fixed) can be changed, once camera angle
Change, system, which must possess, automatically updates parameter, re-starts above-mentioned computing, and the two-dimensional spatial location of microphone array is turned
The left and right changed in the three dimensions of depth camera and depth shaft position.And taken the photograph because user can change in conference process
As brilliance degree, therefore default recording can not be played, system can utilize single distal end in echo cancellation algorithm to adjudicate, when determining some
Between only television set playback in section, talked without local speaker, so that it is guaranteed that above-mentioned computing will not be interfered, result of calculation foot
It is enough accurate.The speaker position that microphone array is estimated can be transformed into using the above method in depth/image pickup head
Position, recycle Face Detection/Face datection scheduling algorithm to obtain the position of speaker in depth/image pickup head.
Through the above description of the embodiments, those skilled in the art can be understood that according to above-mentioned implementation
The method of example can add the mode of required general hardware platform to realize by software, naturally it is also possible to by hardware, but a lot
In the case of the former be more preferably embodiment.Based on such understanding, technical scheme is substantially in other words to existing
The part that technology contributes can be embodied in the form of software product, and the computer software product is stored in a storage
In medium (such as ROM/RAM, magnetic disc, CD), including some instructions to cause a station terminal equipment (can be mobile phone, calculate
Machine, server, or network equipment etc.) perform method described in each embodiment of the present invention.
Embodiment 3
A kind of device of Audio Signal Processing is additionally provided in the present embodiment, and the device is used to realize above-described embodiment
And preferred embodiment, carried out repeating no more for explanation.As used below, term " module " can be realized predetermined
The combination of the software and/or hardware of function.Although device described by following examples is preferably realized with software, firmly
Part, or the realization of the combination of software and hardware is also what may and be contemplated.
Figure 10 is the structural representation of the device of Audio Signal Processing according to embodiments of the present invention, as shown in Figure 10, should
Device includes:
First computing module 1002, carry out by the audio signal gathered according to the first preset algorithm according to multiple Mikes based on
Calculate, obtain the first predicted position of object to be detected;Second computing module 1004, for according to the second preset algorithm to be detected
The historical position of object calculates after being filtered, and obtains the second predicted position of object to be detected;Correction module 1006, for tying
Close the continuity of the first predicted position and the second predicted position according to audio signal in time to be corrected, it is to be detected right to obtain
As the position being currently located.
In the device for the Audio Signal Processing that the embodiment of the present invention passes through, due to according to the first preset algorithm according to multiple wheats
Gram collection audio signal calculated, obtain the first predicted position of object to be detected;According to the second preset algorithm to be checked
The historical position of survey object calculates after being filtered, and obtains the second predicted position of object to be detected;With reference to the first predicted position
It is corrected with continuity of second predicted position according to audio signal in time, obtains the position that object to be detected is currently located
Put.Therefore, can solve due to lacking the position tracking technology to spokesman, cause in net meeting system can not and
When show spokesman position and tracking obtain spokesman's multimedia messages the problem of, reach in time obtain spokesman position
And tracking obtains spokesman's multimedia messages effect.
Embodiment 4
Figure 11 is the structural representation of the device of image procossing according to embodiments of the present invention, as shown in figure 11, the device
Including:
Acquisition module 1102, for obtaining the IMAQ of the first microphone array and display device by presetting microphone array
First depth value of equipment, and the second depth value of the image capture device of the second microphone array and display device;Calculate mould
Block 1104, for calculating the first kind angle of the first microphone array corresponding to the first depth value and image capture device respectively, with
And calculate the second class angle of the second microphone array and image capture device corresponding to the second depth value;Coordinate space module
1106, for according to the first depth value, the second depth value, first kind angle and the second class angle structure hyperspace coordinate system;
Acquisition module 1108, determine object to be detected more for obtaining the position of object to be detected, and according to hyperspace coordinate system
Position in dimension space coordinate system.
In the device for the image procossing that the embodiment of the present invention passes through, due to according to the first depth value, the second depth value, first
Class angle and the second class angle structure hyperspace coordinate system.Therefore, can solve due to lacking the position tracking to spokesman
Technology, cause position and the tracking acquisition spokesman's multimedia that can not show spokesman in time in net meeting system
The problem of information, reach the position for obtaining spokesman in time and tracking obtains spokesman's multimedia messages effect.
Embodiment 5
According to one embodiment of present invention, there is provided a kind of audio signal, the system of image procossing, including:Video council
View terminal, image capture device, depth image collecting device, the sound acquisition module of multiple microphone arrays composition and display are set
It is standby, wherein, the sound acquisition module of multiple microphone array compositions, for gathering the audio signal of object to be detected;IMAQ
Equipment, for gathering all video images in meeting-place;Depth image collecting device, it is deep for gathering the depth image in meeting-place
Degree image is used to obtain the positional information between participant and depth image collecting device;Video conference terminal, for track with
The position of meeting person, displaying participant speech when image and carry out minutes.
To sum up, in conjunction with the embodiments 1 to embodiment 5, audio signal that the embodiment of the present application provides, the method for image procossing,
Device and system are specific as follows:
First, the system tracks speaker position to TDOA algorithms real-time estimate according to more miaow heads are combined, while utilizes card
Kalman Filtering predicting tracing speaker position, and self-correcting is carried out according to the continuity of voice signal in time, obtain
Accurate speaker's location estimation.
In addition, fixed placement depth camera in system, indoor each participant's depth information is obtained by depth camera,
Estimated result of the microphone array to speaker position is adjusted as constraints.
Next, speaker's positional information of acquisition is fed back to system diagram as camera by system, speaker's image is captured.
Finally, speaker's voice is identified according to above- mentioned information, or carries out speech enhan-cement, most result is presented at last
User, can be the form of dynamic title or the minutes with speaker's image.
The hardware includes:Video conference terminal, image pickup head, depth camera, two microphone array A and
B, TV.
This method and system realize can realize specifically automatically during video conference according to the selection of user
Speaker's locating and tracking interested, strengthens special sound, Audio Signal Processing is thought so as to further realize
Dynamic title or minutes are presented in user.This programme has real-time simple, fast advantage, and locating and tracking is more accurate
The characteristics of real-time.
Wherein, Audio Signal Processing interested, enhancing and displaying are specific as follows:
The above method has been able to the speaker position being calculated by microphone array, and combines image and depth camera
Head obtains the relative position information of microphone array, and most speaker associates with depth/image pickup head and determines position at last
Relation.User is arranged to voice interested when can be talked by some speaker in systems, to extract the language of the speaker
Sound;Its voice can also be arranged to voice interested afterwards by selecting some participant in the video image of system, with
Just the voice of the speaker is extracted.Beamforming algorithm can be additionally utilized, the voice in direction where voice interested is increased
By force, by the voice suppression in direction where non-voice interested.Face datection algorithm can also be utilized, obtains the head portrait of speaker,
With reference to Audio Signal Processing algorithm, spoken during showing meeting to user human head picture and content information.
Figure 12 is audio signal according to embodiments of the present invention, the structural representation of the system of image procossing, such as Figure 12 institutes
Show, acoustic signal processing method interested:User selects some participant in the video image of system, and the participant is made
It is as follows for voice speaker interested, step:
Step 1:Locally whether someone speaks for detection in real time in system operation, if someone speaks, utilizes microphone array
Estimate speaker position, and be transformed into left and right and the depth shaft position of the three dimensions of depth camera;
Step 2:Locally or remotely participant, in video image where mouse or touch-control selection speaker interested
Region, the people in the region is as speaker interested;
Step 3:System determines the face characteristic of speaker interested, is spoken using Face tracking algorithm tracking is interested
People, and real-time update voice speaker position interested, and be converted to the speaker position of microphone array estimation;
Step 4:Using beamforming algorithm, by the speech enhan-cement in direction where voice interested, by non-voice interested
The voice suppression in place direction.
Wherein, voice methods of exhibiting interested is specific as follows:
Voice interested can obtain the content of speaker's speech after Audio Signal Processing.If user needs to make
With Audio Signal Processing interested and Enhancement Method, speaker interested is identified, is directly examined in selected region by face
Survey and track algorithm, obtain human face region image, pass through Audio Signal Processing and Enhancement Method can pair interested above
Operation is identified in the voice of speaker interested, and such system can obtain some speaker interested of some period and speak
Content (text mode), and the face-image of the speaker interested using these information, can finally be presented to user
One static user that is easy to watches and recalled the minutes or real-time captions that both pictures and texts are excellent.
Certainly above is saying that the voice content substituted records to specific, if to the proprietary language of whole conference process
Sound content keeps a record, and record flow is different.First:In meeting, system can carry out real to the image that image pickup head gathers
When face recognition to determine the facial characteristics of participant all in field of view, carry out detection in real time here to tackle in meeting
During participant dynamic temporarily away from or increase.Next:When participant makes a speech, all participants are determined by the above method
The relative microphone array position of person, and then (spokesman can be multiple) is strengthened to the voice of spokesman, and to its voice
Be identified, stored with text mode, with reference to the speech Human Head Region Image Segment that is extracted from image pickup head generate real-time captions or
It is complete minutes, minutes are preserved on the basis of the time, also support the edit operations such as corresponding filtering screening certainly.
As shown in figure 13, Figure 13 is the corresponding word methods of exhibiting schematic diagram of voice interested.
Embodiment 6
Embodiments of the invention additionally provide a kind of storage medium.Alternatively, in the present embodiment, above-mentioned storage medium can
The program code for performing following steps to be arranged to storage to be used for:
S1, calculated according to the first preset algorithm according to the audio signal that multiple Mikes gather, obtain object to be detected
The first predicted position;
S2, according to the second preset algorithm treat detection object historical position be filtered after calculate, it is to be detected right to obtain
The second predicted position of elephant;
S3, school is carried out with reference to the continuity of the first predicted position and the second predicted position according to audio signal in time
Just, the position that object to be detected is currently located is obtained.
Alternatively, storage medium is also configured to the program code that storage is used to perform following steps:
S1, calculated according to the first preset algorithm according to the audio signal that multiple Mikes gather, obtain object to be detected
The first predicted position include:Multiple Mikes are classified, are divided into the first microphone array and the second microphone array;According to first
Preset algorithm calculates the first angle between object and the first microphone array to be detected, and calculates and treat according to the first preset algorithm
The second angle between detection object and the second microphone array;According to default trigonometric function, by the first angle and the second angle,
The first predicted position of object to be detected is calculated.
Alternatively, in the present embodiment, above-mentioned storage medium can include but is not limited to:USB flash disk, read-only storage (ROM,
Read-Only Memory), random access memory (RAM, Random Access Memory), mobile hard disk, magnetic disc or
CD etc. is various can be with the medium of store program codes.
Further, alternatively, storage medium is also configured to the program code that storage is used to perform following steps:According to the
The first angle that one preset algorithm is calculated between object and the first microphone array to be detected includes:It is arrival in the first preset algorithm
In the case of time difference algorithm TDOA, the Euclidean distance in the first microphone array between the audio signal of each Mike's collection;According to
Calculated according to the relation of Euclidean distance and the first angle between the audio signal of each Mike collection, obtain the first angle
Estimate value set;The average of the estimation value set of the first angle is calculated, and average is defined as the first angle.
Alternatively, storage medium is also configured to the program code that storage is used to perform following steps:According to the first imputation in advance
The second angle that method is calculated between object and the second microphone array to be detected includes:Calculated in the first preset algorithm for reaching time-difference
In the case of method TDOA, the Euclidean distance between the audio signal of each Mike's collection in the second microphone array is calculated;According to every
The relation of Euclidean distance and the second angle between the audio signal of individual Mike's collection is calculated, and obtains the estimation of the second angle
Value set;The average of the estimation value set of the second angle is calculated, and average is defined as the second angle.
Alternatively, storage medium is also configured to the program code that storage is used to perform following steps:According to the second pre- imputation
Method treat detection object historical position be filtered after calculate, obtaining the second predicted position of object to be detected includes:Pass through
First preset algorithm calculates the first estimation value set of the first pre- measuring angle of the first microphone array, and second Mike's battle array respectively
Second estimation value set of the second pre- measuring angle of row;In the case where the second preset algorithm is Kalman filtering algorithm, pass through
Kalman filtering algorithm judges whether the first estimation value set and the second estimation value set meet preparatory condition respectively;According to judgement
As a result the first angle and the second angle are determined;According to default trigonometric function, calculated, obtained by the first angle and the second angle
To the second predicted position of object to be detected.
Further, optionally, storage medium is also configured to the program code that storage is used to perform following steps:Foundation is treated
The position that detection object is currently located, update Kalman filter parameter.
Further, alternatively, storage medium is also configured to the program code that storage is used to perform following steps:
After the position being currently located to object to be detected, method also includes:Strengthen the voice output of object to be detected.
Alternatively, the specific example in the present embodiment may be referred to described in above-described embodiment and optional embodiment
Example, the present embodiment will not be repeated here.
Obviously, those skilled in the art should be understood that above-mentioned each module of the invention or each step can be with general
Computing device realize that they can be concentrated on single computing device, or be distributed in multiple computing devices and formed
Network on, alternatively, they can be realized with the program code that computing device can perform, it is thus possible to they are stored
Performed in the storage device by computing device, and in some cases, can be with different from shown in order execution herein
The step of going out or describing, they are either fabricated to each integrated circuit modules respectively or by multiple modules in them or
Step is fabricated to single integrated circuit module to realize.So, the present invention is not restricted to any specific hardware and software combination.
The preferred embodiments of the present invention are the foregoing is only, are not intended to limit the invention, for the skill of this area
For art personnel, the present invention can have various modifications and variations.Within the spirit and principles of the invention, that is made any repaiies
Change, equivalent substitution, improvement etc., should be included in the scope of the protection.
Claims (12)
- A kind of 1. method of Audio Signal Processing, it is characterised in that including:Calculated according to the first preset algorithm according to the audio signal that multiple Mikes gather, obtain object to be detected first is pre- Location is put;Calculated after being filtered according to the second preset algorithm to the historical position of the object to be detected, it is described to be detected right to obtain The second predicted position of elephant;Enter with reference to the continuity of first predicted position and second predicted position according to the audio signal in time Row correction, obtains the position that the object to be detected is currently located.
- 2. according to the method for claim 1, it is characterised in that described to be gathered according to the first preset algorithm according to multiple Mikes Audio signal calculated, obtaining the first predicted position of object to be detected includes:The multiple Mike is classified, is divided into the first microphone array and the second microphone array;The first angle between the object to be detected and first microphone array is calculated according to first preset algorithm, with And calculate the second angle between the object to be detected and second microphone array according to first preset algorithm;According to default trigonometric function, by first angle and second angle, the object to be detected is calculated First predicted position.
- 3. according to the method for claim 2, it is characterised in that described described to be checked according to first preset algorithm calculating The first angle surveyed between object and first microphone array includes:In the case where first preset algorithm is arrival time difference algorithm TDOA, calculate each in first microphone array Euclidean distance between the audio signal of Mike's collection;Relation according to the Euclidean distance and first angle between the audio signal of each Mike collection is calculated, Obtain the estimation value set of first angle;The average of the estimation value set of first angle is calculated, and the average is defined as first angle.
- 4. according to the method for claim 2, it is characterised in that described described to be checked according to first preset algorithm calculating The second angle surveyed between object and second microphone array includes:In the case where first preset algorithm is arrival time difference algorithm TDOA, calculate each in second microphone array Euclidean distance between the audio signal of Mike's collection;Relation according to the Euclidean distance and second angle between the audio signal of each Mike collection is calculated, Obtain the estimation value set of second angle;The average of the estimation value set of second angle is calculated, and the average is defined as second angle.
- 5. according to the method for claim 2, it is characterised in that it is described according to the second preset algorithm to the object to be detected Historical position be filtered after calculate, obtaining the second predicted position of the object to be detected includes:Calculate the first estimate collection of the first pre- measuring angle of first microphone array respectively by first preset algorithm Close, and the second estimation value set of the second pre- measuring angle of second microphone array;In the case where second preset algorithm is Kalman filtering algorithm, judged respectively by the Kalman filtering algorithm Whether the first estimation value set and the second estimation value set meet preparatory condition;First angle and second angle are determined according to judged result;According to default trigonometric function, calculated by first angle and second angle, it is above-mentioned to be detected right to obtain The second predicted position of elephant.
- 6. according to the method for claim 5, it is characterised in that obtain position that the object to be detected is currently located it Afterwards, methods described also includes:The position being currently located according to the object to be detected, update Kalman filter parameter.
- 7. method according to any one of claim 1 to 6, it is characterised in that the object to be detected is current obtaining After the position at place, methods described also includes:Strengthen the voice output of the object to be detected.
- A kind of 8. method of image procossing, it is characterised in that including:The first depth value for obtaining the image capture device of the first microphone array and display device by presetting microphone array, and Second depth value of the second microphone array and the image capture device of the display device;The first kind for calculating first microphone array corresponding to first depth value and described image collecting device respectively is pressed from both sides Angle, and calculate the second class folder of second microphone array corresponding to second depth value and described image collecting device Angle;Multidimensional is built according to first depth value, second depth value, the first kind angle and the second class angle Space coordinates;The position of object to be detected is obtained, and determines the object to be detected in the multidimensional according to the hyperspace coordinate system Position in space coordinates.
- 9. according to the method for claim 8, it is characterised in that calculate respectively described first corresponding to first depth value The first kind angle of microphone array and described image collecting device, and calculate second wheat corresponding to second depth value Second class angle of gram array and described image collecting device includes:According to first depth and second depth and the preparatory condition of actual range, the first kind angle and institute are calculated State the second class angle.
- A kind of 10. device of Audio Signal Processing, it is characterised in that including:First computing module, for being calculated according to the first preset algorithm according to the audio signal that multiple Mikes gather, obtain First predicted position of object to be detected;Second computing module, after being filtered according to the second preset algorithm to the historical position of the object to be detected based on Calculate, obtain the second predicted position of the object to be detected;Correction module, for reference to first predicted position and second predicted position according to the audio signal in the time On continuity be corrected, obtain the position that the object to be detected is currently located.
- A kind of 11. device of image procossing, it is characterised in that including:Acquisition module, for obtain the image capture device of the first microphone array and display device by presetting microphone array the One depth value, and the second depth value of the second microphone array and the image capture device of the display device;Computing module, set for calculating first microphone array corresponding to first depth value respectively with described image collection Standby first kind angle, and calculate second microphone array corresponding to second depth value and described image collecting device The second class angle;Coordinate space module, for according to first depth value, second depth value, the first kind angle and described Two class angles build hyperspace coordinate system;Acquisition module, for obtaining the position of object to be detected, and determine according to the hyperspace coordinate system described to be detected Position of the object in the hyperspace coordinate system.
- 12. a kind of audio signal, the system of image procossing, it is characterised in that including:Video conference terminal, image capture device, The sound acquisition module and display device of depth image collecting device, multiple microphone arrays composition, wherein,The sound acquisition module of the multiple microphone array composition, for gathering the audio signal of object to be detected;Described image collecting device, for gathering all video images in meeting-place;The depth image collecting device, for gathering the depth image in the meeting-place, the depth image be used to obtaining with Positional information between meeting person and the depth image collecting device;The video conference terminal, for tracking the position of the participant, show image of the participant in speech simultaneously Carry out minutes.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610826122.5A CN107820037B (en) | 2016-09-14 | 2016-09-14 | Audio signal, image processing method, device and system |
PCT/CN2017/097397 WO2018049957A1 (en) | 2016-09-14 | 2017-08-14 | Audio signal, image processing method, device, and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610826122.5A CN107820037B (en) | 2016-09-14 | 2016-09-14 | Audio signal, image processing method, device and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107820037A true CN107820037A (en) | 2018-03-20 |
CN107820037B CN107820037B (en) | 2021-03-26 |
Family
ID=61600778
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610826122.5A Active CN107820037B (en) | 2016-09-14 | 2016-09-14 | Audio signal, image processing method, device and system |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN107820037B (en) |
WO (1) | WO2018049957A1 (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109547735A (en) * | 2019-01-18 | 2019-03-29 | 海南科先电子科技有限公司 | A kind of meeting integrated system |
CN109683135A (en) * | 2018-12-28 | 2019-04-26 | 科大讯飞股份有限公司 | A kind of sound localization method and device, target capturing system |
CN110632582A (en) * | 2019-09-25 | 2019-12-31 | 苏州科达科技股份有限公司 | Sound source positioning method, device and storage medium |
CN110730378A (en) * | 2019-11-01 | 2020-01-24 | 联想(北京)有限公司 | Information processing method and system |
CN111312295A (en) * | 2018-12-12 | 2020-06-19 | 深圳市冠旭电子股份有限公司 | Holographic sound recording method and device and recording equipment |
CN112198498A (en) * | 2020-09-11 | 2021-01-08 | 海创半导体科技(深圳)有限公司 | Method for measuring distance by using intelligent voice module |
CN112868061A (en) * | 2019-11-29 | 2021-05-28 | 深圳市大疆创新科技有限公司 | Environment detection method, electronic device and computer-readable storage medium |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110660102B (en) * | 2019-06-17 | 2020-10-27 | 腾讯科技(深圳)有限公司 | Speaker recognition method, device and system based on artificial intelligence |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1460185A (en) * | 2001-03-30 | 2003-12-03 | 皇家菲利浦电子有限公司 | Method and apparatus for audio-image speaker detection and location |
CN101030323A (en) * | 2007-04-23 | 2007-09-05 | 凌子龙 | Automatic evidence collecting device on crossroad for vehicle horning against traffic regulation |
US20080170717A1 (en) * | 2007-01-16 | 2008-07-17 | Microsoft Corporation | Energy-based sound source localization and gain normalization |
CN101377885A (en) * | 2007-08-28 | 2009-03-04 | 凌子龙 | Electronic workstation for obtaining evidence of vehicle peccancy whistle and method thereof |
CN102256098A (en) * | 2010-05-18 | 2011-11-23 | 宝利通公司 | Videoconferencing endpoint having multiple voice-tracking cameras |
US20150201278A1 (en) * | 2014-01-14 | 2015-07-16 | Cisco Technology, Inc. | Muting a sound source with an array of microphones |
CN204539315U (en) * | 2015-04-02 | 2015-08-05 | 尹煜敏 | A kind of video conference machine of auditory localization |
CN105607042A (en) * | 2014-11-19 | 2016-05-25 | 北京航天长峰科技工业集团有限公司 | Method for locating sound source through microphone array time delay estimation |
CN105657329A (en) * | 2016-02-26 | 2016-06-08 | 苏州科达科技股份有限公司 | Video conference system, processing device and video conference method |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7039199B2 (en) * | 2002-08-26 | 2006-05-02 | Microsoft Corporation | System and process for locating a speaker using 360 degree sound source localization |
CN101201399B (en) * | 2007-12-18 | 2012-01-11 | 北京中星微电子有限公司 | Sound localization method and system |
CN101656908A (en) * | 2008-08-19 | 2010-02-24 | 深圳华为通信技术有限公司 | Method for controlling sound focusing, communication device and communication system |
CN102843543B (en) * | 2012-09-17 | 2015-01-21 | 华为技术有限公司 | Video conferencing reminding method, device and video conferencing system |
CN103841357A (en) * | 2012-11-21 | 2014-06-04 | 中兴通讯股份有限公司 | Microphone array sound source positioning method, device and system based on video tracking |
CN105588543B (en) * | 2014-10-22 | 2019-10-18 | 中兴通讯股份有限公司 | A kind of method, apparatus and positioning system for realizing positioning based on camera |
-
2016
- 2016-09-14 CN CN201610826122.5A patent/CN107820037B/en active Active
-
2017
- 2017-08-14 WO PCT/CN2017/097397 patent/WO2018049957A1/en active Application Filing
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1460185A (en) * | 2001-03-30 | 2003-12-03 | 皇家菲利浦电子有限公司 | Method and apparatus for audio-image speaker detection and location |
US20080170717A1 (en) * | 2007-01-16 | 2008-07-17 | Microsoft Corporation | Energy-based sound source localization and gain normalization |
CN101030323A (en) * | 2007-04-23 | 2007-09-05 | 凌子龙 | Automatic evidence collecting device on crossroad for vehicle horning against traffic regulation |
CN101377885A (en) * | 2007-08-28 | 2009-03-04 | 凌子龙 | Electronic workstation for obtaining evidence of vehicle peccancy whistle and method thereof |
CN102256098A (en) * | 2010-05-18 | 2011-11-23 | 宝利通公司 | Videoconferencing endpoint having multiple voice-tracking cameras |
US20150201278A1 (en) * | 2014-01-14 | 2015-07-16 | Cisco Technology, Inc. | Muting a sound source with an array of microphones |
CN105607042A (en) * | 2014-11-19 | 2016-05-25 | 北京航天长峰科技工业集团有限公司 | Method for locating sound source through microphone array time delay estimation |
CN204539315U (en) * | 2015-04-02 | 2015-08-05 | 尹煜敏 | A kind of video conference machine of auditory localization |
CN105657329A (en) * | 2016-02-26 | 2016-06-08 | 苏州科达科技股份有限公司 | Video conference system, processing device and video conference method |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111312295A (en) * | 2018-12-12 | 2020-06-19 | 深圳市冠旭电子股份有限公司 | Holographic sound recording method and device and recording equipment |
CN109683135A (en) * | 2018-12-28 | 2019-04-26 | 科大讯飞股份有限公司 | A kind of sound localization method and device, target capturing system |
CN109547735A (en) * | 2019-01-18 | 2019-03-29 | 海南科先电子科技有限公司 | A kind of meeting integrated system |
CN109547735B (en) * | 2019-01-18 | 2024-04-16 | 海南科先电子科技有限公司 | Conference integration system |
CN110632582A (en) * | 2019-09-25 | 2019-12-31 | 苏州科达科技股份有限公司 | Sound source positioning method, device and storage medium |
CN110632582B (en) * | 2019-09-25 | 2022-03-29 | 苏州科达科技股份有限公司 | Sound source positioning method, device and storage medium |
CN110730378A (en) * | 2019-11-01 | 2020-01-24 | 联想(北京)有限公司 | Information processing method and system |
CN112868061A (en) * | 2019-11-29 | 2021-05-28 | 深圳市大疆创新科技有限公司 | Environment detection method, electronic device and computer-readable storage medium |
CN112198498A (en) * | 2020-09-11 | 2021-01-08 | 海创半导体科技(深圳)有限公司 | Method for measuring distance by using intelligent voice module |
Also Published As
Publication number | Publication date |
---|---|
CN107820037B (en) | 2021-03-26 |
WO2018049957A1 (en) | 2018-03-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107820037A (en) | The methods, devices and systems of audio signal, image procossing | |
CN110082723B (en) | Sound source positioning method, device, equipment and storage medium | |
US9595259B2 (en) | Sound source-separating device and sound source-separating method | |
US10122972B2 (en) | System and method for localizing a talker using audio and video information | |
CN103841357A (en) | Microphone array sound source positioning method, device and system based on video tracking | |
CN100399240C (en) | Communication and collaboration system using rich media environments | |
CN101198945B (en) | Management system for rich media environments | |
Lee et al. | Portable meeting recorder | |
Zhou et al. | Target detection and tracking with heterogeneous sensors | |
CN103581608B (en) | Spokesman's detection system, spokesman's detection method and audio/video conferencingasystem figureu | |
CN101567969B (en) | Intelligent video director method based on microphone array sound guidance | |
GB2342802A (en) | Indexing conference content onto a timeline | |
US10582117B1 (en) | Automatic camera control in a video conference system | |
CN111432115B (en) | Face tracking method based on voice auxiliary positioning, terminal and storage device | |
CN111046850B (en) | Speaker positioning method based on sound and image fusion | |
CN103581606A (en) | Multimedia collecting device and method | |
US9165182B2 (en) | Method and apparatus for using face detection information to improve speaker segmentation | |
CN110503957A (en) | A kind of audio recognition method and device based on image denoising | |
US9756421B2 (en) | Audio refocusing methods and electronic devices utilizing the same | |
CN111551921A (en) | Sound source orientation system and method based on sound image linkage | |
CN115242971A (en) | Camera control method and device, terminal equipment and storage medium | |
JP2005141687A (en) | Method, device, and system for object tracing, program, and recording medium | |
McCowan et al. | Speech acquisition in meetings with an audio-visual sensor array | |
Chen et al. | Speaker tracking and identifying based on indoor localization system and microphone array | |
US9883142B1 (en) | Automated collaboration system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20180426 Address after: No. 55, Nanshan District science and technology road, Nanshan District, Shenzhen, Guangdong Applicant after: ZTE Corporation Address before: 210012 No. 68 Bauhinia Road, Yuhuatai District, Jiangsu, Nanjing Applicant before: Nanjing Zhongxing New Software Co., Ltd. |
|
TA01 | Transfer of patent application right | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |