WO2015090039A1 - 一种声音处理方法、装置及设备 - Google Patents

一种声音处理方法、装置及设备 Download PDF

Info

Publication number
WO2015090039A1
WO2015090039A1 PCT/CN2014/081511 CN2014081511W WO2015090039A1 WO 2015090039 A1 WO2015090039 A1 WO 2015090039A1 CN 2014081511 W CN2014081511 W CN 2014081511W WO 2015090039 A1 WO2015090039 A1 WO 2015090039A1
Authority
WO
WIPO (PCT)
Prior art keywords
sound
signal
axis
speaker
sound source
Prior art date
Application number
PCT/CN2014/081511
Other languages
English (en)
French (fr)
Inventor
吴文海
王田
张德军
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2015090039A1 publication Critical patent/WO2015090039A1/zh

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control
    • H04S7/30Control circuits for electronic adaptation of the sound field
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/141Systems for two-way working between two video terminals, e.g. videophone
    • H04N7/147Communication arrangements, e.g. identifying the communication as a video-communication, intermediate storage of the signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2400/00Details of stereophonic systems covered by H04S but not provided for in its groups
    • H04S2400/11Positioning of individual sound objects, e.g. moving airplane, within a sound field
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2400/00Details of stereophonic systems covered by H04S but not provided for in its groups
    • H04S2400/15Aspects of sound capture and related signal processing for recording or reproduction

Definitions

  • the present invention relates to the field of communications technologies, and in particular, to a sound signal processing method, apparatus, and device. Background technique
  • the audio stream in the audio conference is processed by using a 3D sound processing method, that is, by assigning a sound image position to each audio stream, and adjusting the gain of the audio stream in the left and right channels according to the positional relationship of the audio streams of the respective sound image positions.
  • the size in turn, creates a three-dimensional sound effect.
  • the current 3D sound processing method is to realize the 3D sound effect of the venue through simple gain adjustment of the left and right channels, but the current 3D sound effect is to play the audio through the fixed speaker, so the current 3D sound effect can only achieve a single effect, reducing User experience. Summary of the invention
  • the embodiment of the invention provides a method, a device and a device for processing a sound signal, which are used to solve the problem that the 3D sound effect is single in the prior art.
  • a first aspect of the present invention provides a method for processing a sound signal, including:
  • the differential signal L on the X-axis and the Y-axis can be obtained by the following formula:
  • the differential signal z on the Z-axis can be obtained by the following formula:
  • g in zl , r zl are the gain coefficients and delay signals of a sound collection point on the ⁇ axis, respectively
  • gain z2 , 5( ⁇ _ ⁇ ⁇ 2 ) are the gain coefficients and delays of another sound collection point on the Z axis, respectively.
  • the chirp signal, ⁇ is the ratio between the distance between two adjacent sound collection points and the speed of sound transmission.
  • the method further includes:
  • the amplitude value of the sound source signal corresponds to the amplitude value of the differential signal on each coordinate axis.
  • the sound source signal is determined by the following equation (the amplitude value of 0 corresponds to the amplitude value of the differential signal on each coordinate axis: x
  • u*cos( ⁇ 9)*cos( ) ⁇
  • is the differential signal > the amplitude value on the x-axis
  • is the amplitude value of the differential signal ⁇ on the ⁇ axis
  • U is the amplitude attenuation coefficient
  • is the coordinate origin of the sound source signal and the coordinate origin of the three-dimensional polar coordinate
  • the angle between the projection of the line in the pupil plane and the X axis represents the angle between the line and the plane of the pupil.
  • the location information of the speaker is obtained, and the sound output signal of the speaker is obtained according to the position information of the speaker and the differential signal corresponding to each coordinate axis, including :
  • the position information of the speaker and the differential signal on each coordinate axis are processed by the following formula to generate a sound output signal corresponding to the speaker:
  • m(k) 0.5* [S(t) + x * cos((3 ⁇ 4 ) cos( ) + y * sin((3 ⁇ 4 ) cos( ⁇ ) + sin( ⁇ )] where) represents the output signal of each speaker.
  • a second aspect of the embodiments of the present invention provides a sound signal processing apparatus, including:
  • An acquiring module configured to acquire a sound source signal in the set area and image information of the set area
  • a determining module configured to obtain, according to the image information, a coordinate position of the sound source signal in three-dimensional polar coordinates
  • a first processing module configured to obtain, according to a coordinate position of the sound source signal, a differential signal corresponding to each coordinate axis of the sound source signal;
  • a second processing module configured to acquire position information of the speaker, and obtain a sound output signal of the speaker according to the position information of the speaker and the differential signal corresponding to each coordinate axis.
  • the first processing module is further configured to obtain an amplitude value that is used to represent the strength of the sound source signal, according to the amplitude value of the sound source signal and the sound source signal The coordinate position in the preset three-dimensional polar coordinates is obtained, and the amplitude value of the sound source signal corresponds to the amplitude value of the differential signal on each coordinate axis.
  • a third aspect of the embodiments of the present invention provides a sound signal processing apparatus, including:
  • a sound collector for acquiring a sound signal in a set area
  • An image collector configured to acquire image information in the set area
  • a processor configured to obtain a coordinate position of the sound source signal in the three-dimensional polar coordinate according to the image information, and obtain a differential signal corresponding to each coordinate axis of the sound source signal according to the coordinate position of the sound source signal, and obtain Position information of the speaker, and obtaining a sound output signal of the speaker according to the position information of the speaker and the differential signal corresponding to each coordinate axis.
  • the sound source signal in the set region and the image information of the set region are acquired, and the coordinate position of the sound source signal in the three-dimensional polar coordinate is obtained according to the image information, and the sound source signal is obtained according to the coordinate position of the sound source signal.
  • the position information of the speaker is obtained, and the sound output signal of the speaker is obtained according to the position information of the speaker and the obtained differential signal of each coordinate axis, and finally the speaker outputs according to the obtained output sound signal.
  • the 3D sound effect is realized by a plurality of speakers, and the sound position of the speaker and the sound size can be reflected by the 3D sound effect, thereby avoiding the problem that the 3D sound effect in the prior art is single.
  • FIG. 1 is a flowchart of a method for processing a sound signal according to an embodiment of the present invention
  • FIG. 2 is a schematic diagram of coordinate positioning in a first set area according to an embodiment of the present invention
  • FIG. 3 is a schematic diagram of sound signal collection in an embodiment of the present invention
  • FIG. 4 is a schematic diagram of a speaker setting position in a second setting area according to an embodiment of the present invention
  • FIG. 5 is a schematic diagram showing a coordinate position of a speaker in a second setting area according to an embodiment of the present invention
  • FIG. 6 is a schematic structural diagram of a sound signal processing apparatus according to an embodiment of the present invention
  • FIG. 7 is a schematic structural diagram of a sound signal processing device according to an embodiment of the present invention. detailed description
  • the current 3D audio technology is more and more widely used, for example, in games, movies, and conferences, and can be applied to 3D audio technology.
  • 3D audio technology users can be more realistic, thereby enhancing the user's experience in sound.
  • video conferencing after applying 3D audio technology, not only can the user have a more realistic feeling, but also can make the speaker's voice information more recognizable.
  • the current 3D audio technology is generally accomplished by the gain and delay adjustment of two sound signals, which makes the implementation of 3D audio technology single, and this single implementation makes the 3D sound experience less realistic.
  • an embodiment of the present invention provides a method for processing a sound signal, the method comprising: acquiring a sound source signal in a set region and image information of a set region, and obtaining a sound source signal in the three-dimensional polar coordinate according to the image information.
  • the coordinate position is obtained according to the coordinate position of the sound source signal, and the differential signal corresponding to each coordinate axis of the sound source signal is obtained, the position information of the speaker is obtained, and the speaker is obtained according to the position information of the speaker and the differential signal corresponding to each coordinate axis.
  • the speaker speaks in the first conference room, and all the listeners are in the second conference room.
  • the sound processing device determines the speaker in the first room through the three-dimensional polar coordinates in the first conference room.
  • the coordinate position of the conference room, and the speaker's sound source signal is decomposed into differential signals on each axis, and then combined with the position of the speaker in the second conference room to determine the differential signal that the speaker should output, thereby passing through the speaker
  • the combined play so that the user can determine the speaking position of the speaker in the first conference room through the sound signal output by the speaker
  • Embodiment 1 is a diagrammatic representation of Embodiment 1:
  • FIG. 1 is a flowchart of a method for processing a sound signal according to an embodiment of the present invention, where the method includes:
  • a sound processing device is disposed in the setting area, and at least the sound collecting device and the image collecting device are included in the sound processing device.
  • the sound collecting device is an omnidirectional microphone for coming The sound source signal in the set area is collected
  • the image collecting device is an omnidirectional camera for collecting image information in the set area.
  • the sound collecting device When the user sends a sound signal in the set area, the sound collecting device will collect the sound source signal of the speaker while the image collecting device will collect the image information in the set area, and then obtain the sound source signal based on the analysis of the image.
  • the coordinate position in the three-dimensional polar coordinates in the set area When the user sends a sound signal in the set area, the sound collecting device will collect the sound source signal of the speaker while the image collecting device will collect the image information in the set area, and then obtain the sound source signal based on the analysis of the image.
  • the coordinate position in the three-dimensional polar coordinates in the set area When the user sends a sound signal in the set area, the sound collecting device will collect the sound source signal of the speaker while the image collecting device will collect the image information in the set area, and then obtain the sound source signal based on the analysis of the image.
  • the omnidirectional camera in the set area captures a panoramic image in a set area, and determines the coordinate position of the speaker in the set area based on the coordinate origin in the three-dimensional polar coordinates in the set area, for example
  • the speaker's position coordinates in the set area can be expressed using polar coordinates (r, ⁇ , ), where r represents the distance between the speaker and the coordinate origin.
  • r represents the distance between the speaker and the coordinate origin.
  • 0 represents the angle between the projection of the speaker's sound source position and the coordinate origin on the XOY plane and the X-axis, and 0 represents the angle between the projection on the XOY plane and the line. Since the sound source signal is sent by the speaker, the position of the speaker is determined while the sound source signal is
  • the sound processing device decomposes the sound source signal 5 (0) onto each coordinate axis of the three-dimensional polar coordinates, that is, obtains a differential signal in the direction of the X, ⁇ , and ⁇ axes, specifically
  • the differential signals in the direction of the X, ⁇ , and ⁇ axes can be based on the principle of free acoustic wave transmission, that is, when a point wave is transmitted to the signals of two virtual omnidirectional microphones that are very close to each other.
  • the differential signal on the coordinate axis for example, as shown in Fig. 3, in Fig. 3, an omnidirectional microphone can be virtualized as two omnidirectional microphones with very close distances, respectively, by acquiring the sound source signals corresponding to the two on the coordinate axes.
  • the first sound sample signal and the second sound sample signal on the adjacent collection points can obtain the sound source signal on the coordinate axis according to the difference between the first sound sampling signal and the second sound sampling signal on the coordinate axis. Differential signal.
  • the X-axis or the differential signal on Y can be, but is not limited to, the following formula (1):
  • L sqrt ⁇ 2) 12 * (gain * S(t - ⁇ ⁇ ) - gain L2 *S(tT L2 ))*K ( l )
  • L represents the differential signal on the X-axis: ⁇ 3 ⁇ 4 * S( t - r ri ) represents the first sound sampling signal collected by a sound collection point on the X-axis, gun xl , the gain coefficient of a sound collection point on the X-axis, and the delay signal, g ⁇ 2 *S(iT 2 ) represents the second sound sampling signal collected by another sound collection point on the X-axis, gain x2 , r x2 ) are respectively the gain coefficient and delay signal of another sound collection point on the X-axis, and K is a complex exponential sequence e.
  • the position of the sound source signal can be expressed by polar coordinates (r, ⁇ , the first sound sample signal on the X axis and the gain coefficient of the second sound sample signal in the first preset coordinate system
  • the delay time can be obtained by the polar signal S (0 pole coordinates), which can be obtained by the following formula:
  • T refers to the ratio between the distance between the acquisition center points of two virtual omnidirectional microphones and the sound transmission speed, that is, the sound delay time between two virtual omnidirectional microphones, and 1 indicates that the sound source signal is transmitted to X.
  • the time of a sound collection point on the axis, ⁇ 2 indicates the time at which the sound source signal is transmitted to another sound collection point on the X axis.
  • L represents the differential signal on the Y-axis: -) represents the first sound sampled signal collected at a sound collection point on the Y-axis, gam yl , the gain coefficient of a sound collection point on the Y-axis, and the delay signal, respectively.
  • Gain yl * S(t _ T 2 ) represents the second sound sample signal collected by another sound collection point on the Y axis, and gnn y2 and S( _ ⁇ 2 ) are respectively another sound collection on the Y axis.
  • the gain coefficient of the point and the delayed signal, K is the complex exponential sequence e j ,.
  • the position of the sound source signal can be represented by the polar coordinates ( r , ⁇ , 0 ), the gain coefficient and the delay time in the first sound sampling signal and the second sound source sampling signal on the Y axis can pass through the sound
  • the polar coordinates of the source signal are obtained, which can be obtained by the following formula:
  • T yi T * _o.5 + sqrt ⁇ / 4 - s i ) cos( ⁇ )] ⁇
  • T y i Indicates the time at which the sound source signal is transmitted to another sound collection point on the X-axis.
  • g" zl *S ( - represents the first sound sample signal collected by a sound collection point on the Z axis
  • gai , r zl is the gain coefficient of a sound collection point on the Z axis
  • the delay signal, gain z2 * S( - r z2 ) represents the second sound sampled signal collected by another sound collection point on the z-axis
  • gain z2 and S( _r z2 ) are respectively another sound collection point on the Z-axis.
  • the source signal S (0 is obtained from the polar coordinates, which can be obtained by the following formula:
  • the differential signal on each coordinate axis of the sound source signal obtained by the above processing process in each of the three-dimensional polar coordinates after obtaining the differential signal on each coordinate axis, it is necessary to adjust each coordinate according to the amplitude value of the sound source signal.
  • the amplitude value of the differential signal on the axis is necessary to adjust each coordinate according to the amplitude value of the sound source signal.
  • the amplitude value of the sound source signal is obtained, and according to the amplitude value of the sound source signal and the coordinate position of the sound source signal in the three-dimensional polar coordinates, the amplitude value of the sound source signal is determined to correspond to the amplitude value on each coordinate axis. Finally, the amplitude value of the differential signal of each coordinate axis is adjusted according to the amplitude value on each coordinate axis, and the specific adjustment manner can be, but is not limited to, obtained by the following formula:
  • is the amplitude value of the differential signal on the ⁇ axis
  • is the amplitude value of the differential signal ⁇ on the ⁇ axis
  • u is the amplitude attenuation coefficient
  • the amplitude values of the corresponding differential signals on the respective coordinate axes are adjusted according to the obtained amplitude values, that is, the amplitude values of the differential signals on the X axis are adjusted to the X ⁇ axis.
  • the amplitude value of the differential signal is adjusted such that the amplitude value of the differential signal on the z-axis is adjusted to ⁇ .
  • the three-dimensional polar coordinates are also present in the area where the speaker is located, and the position coordinates of the speaker can also be represented by polar coordinates, for example, as shown in FIG. 4, which is set in FIG. Is the three-dimensional polar coordinates, that is, including the X-axis, the Y-axis, and the Z-axis.
  • the position of the speaker in the three-dimensional polar coordinates should be the positional relationship shown in Figure 5.
  • the speaker is in three-dimensional polar coordinates, at this time the speaker
  • the position can be characterized by polar coordinates ( r, ⁇ , , ⁇ ), where r is used to characterize the distance between the speaker and the origin of the coordinates in the polar coordinate system, indicating that the line between the speaker and the coordinate origin is on the XOY plane.
  • the angle between the projection line and the X-axis, the angle between the projection line and the line, each of the speakers in Figure 5 can be characterized by polar coordinates.
  • the gain of the speaker is selected and calculated, and the gain-adjusted output signal is obtained, and the gain-adjusted output signal is as follows :
  • m(k) 0.5 * gain * [S(t) + x * cos ( ) cos(3 ⁇ 4 ) + y sin( ) cos( ⁇ 3 ⁇ 4 ) + z sin( ⁇ 3 ⁇ 4 )] After gaining each speaker gain adjustment After the output signal, the sound output signal of each speaker is sent to the corresponding speaker, so that the speaker outputs according to the obtained output signal.
  • 3D sound effects can be formed in the area, and the sound source signal can be restored more accurately, so that the listener can feel the position of the sound source signal and feel the sound source.
  • the position change and the intensity of the sound source signal change, thereby realizing the stereoscopic effect on the sound source signal in various orientations, and improving the rendering effect of the 3D sound effect.
  • the sound output signal of the speaker when the position of the speaker changes, that is, when the position of the sound source changes, the sound output signal of the speaker also changes at the same time, so that the output signal of the speaker can be changed with the position of the sound source.
  • the change reflects the position change of the sound source signal, and on the basis of improving the rendering effect of the 3D sound effect, the listener can also feel the sounding position of the speaker in the first set area at any time through the sound output signal outputted by the speaker. , realized 3D intrusive sound effects and improved user experience.
  • the sound source signals are respectively decomposed into differential signals on three coordinate axes by an omnidirectional microphone.
  • a general microphone can only collect two signals. Therefore, in the embodiment of the present invention, the sound source signal can be decomposed into differential signals on two coordinate axes, and the sound output signals finally sent to the speaker are obtained through the two differential signals.
  • the speaker is still in the set area, and there is a three-dimensional pole in the set area.
  • the position of the sound source signal can be represented by three-dimensional polar coordinates, namely: ( r, ⁇ , ⁇ ).
  • the differential signal of the sound source signal on the X-axis can be obtained.
  • the sound signal collection mode of the axis is the same as that of the above embodiment, that is, a microphone is virtualized as two adjacent microphone collection points, thereby obtaining two acquired sound signals, and the sound signals collected twice are obtained.
  • Obtain the differential signal on the X-axis as:
  • X sqrtil) 12 * (gain xl ⁇ S(tT xl )- gain x2 "S(tT xl - ⁇ ⁇ , ) where 3 ⁇ 4 characterizes the first sound collection point gain of the sound source signal on the x-axis, S -D Characterizing the delayed signal of the first sound collection point of the sound source signal on the X axis, g « 2 characterizing the second sound collection point gain of the sound source signal on the X axis, S(tT x2 _ 3 ) characterizing the sound source signal at X A delayed signal at the second sound collection point on the axis.
  • the position of the sound source signal can be represented by the polar coordinates ( ⁇ ) in the three-dimensional coordinate system
  • the gain on the X-axis and the delay time can be obtained by the polar coordinates of the sound source signal
  • the specific formula can be obtained by the following formula Obtain:
  • ⁇ 1 r*[-0.5 + ⁇ r (5/4-cos( ⁇ ))]
  • T x2 T * [-0.5 + sqrt(514 + cos((9))]
  • refers to the ratio between the distance between the acquisition points of two virtual omnidirectional microphones and the speed of sound transmission, that is, the sound delay time between two virtual omnidirectional microphones
  • r xl indicates that the sound source signal is transmitted to X.
  • the time of a sound collection point on the axis, T x2 indicates the time at which the sound source signal is transmitted to another sound collection point on the X axis.
  • the X-axis thus obtained is a differential signal directed to a heart shape.
  • ⁇ ⁇ characterizes the gain of the first sound collection point of the sound source signal on the x-axis
  • >S(i - ) characterizes the delayed signal of the first sound collection point of the sound source signal on the ⁇ axis
  • giw ⁇ 2 characterizes the sound source signal
  • the second sound collection point gain on the Y axis, r y2 characterizes the delayed signal of the second sound collection point of the sound source signal on the gamma axis.
  • the position of the sound source signal can be represented by the polar coordinates ( r , ⁇ ) in the three-dimensional coordinate system
  • the gain on the Y-axis and the delay time can be obtained by the polar coordinates of the sound source signal, and the specific Get the following formula:
  • T yl r * [-0.5 + S ⁇ ri(5 / 4 - sin( ⁇ ))]
  • Tj2 r * [_0 ⁇ 5 + sqrt ⁇ 514 + sin( ⁇ ))] indicates the time when the sound source signal S (0 is transmitted to a sound collection point on the Y axis, and r w indicates that the sound source signal is transmitted to the ⁇ axis Another time to collect sound points.
  • the Y-axis thus obtained is a differential signal pointing to a figure of eight
  • the two differential signals obtained are supplied to the two speaker outputs in the other region, and the output signals of the two speakers in the other region can pass.
  • the following formula is obtained:
  • the output signal of the speaker on the left side relative to the origin of the coordinate is characterized, of course; the output signal of the right speaker relative to the origin of the coordinate is characterized.
  • the gain adjustment here can be adjusted according to the actual application scenario, that is, the gain can be adjusted up or down.
  • the gain-adjusted output signal is obtained:
  • the final output signal is sent to the corresponding speaker, specifically the £ output signal is sent to the left speaker relative to the coordinate origin coordinates, and the 7? output signal is sent to the right speaker relative to the coordinate origin. Finally, the output signals of the left and right speakers are mixed in the second setting area to form a 3D sound effect, which increases the implementation of the 3D sound effect.
  • the speaker output signal is also adjusted correspondingly, and then passed.
  • the speaker outputs the adjusted output signal to form different 3D sound effects, so that the listener can feel the change of the position of the sound signal, thereby improving the user experience.
  • Embodiment 2 is a diagrammatic representation of Embodiment 1:
  • the embodiment of the present invention further provides an audio signal processing apparatus, where the apparatus includes:
  • the obtaining module 601 is configured to acquire a sound source signal in the set area and image information of the set area;
  • a determining module 602 configured to obtain a coordinate position of the sound source signal in the three-dimensional polar coordinate according to the image information
  • a first processing module 603 configured to obtain, according to a coordinate position of the sound source signal, a differential signal corresponding to each coordinate axis of the sound source signal;
  • the second processing module 604 is configured to acquire position information of the speaker, and obtain a sound output signal of the speaker according to the position information of the speaker and the differential signal corresponding to each coordinate axis.
  • the first processing module 603 is further configured to obtain an amplitude value that represents a signal strength of the sound source, and obtain an sound according to the amplitude value of the sound source signal and the coordinate position of the sound source signal in the preset three-dimensional polar coordinates.
  • the amplitude value of the source signal corresponds to the amplitude value of the differential signal on each of the coordinate axes.
  • the second processing module 604 is specifically configured to determine location information of the speaker in the area
  • m ⁇ k 0.5 * [S(t) + x * cos( ⁇ 3 ⁇ 4 ) cos( ) + _y * sin( ⁇ 3 ⁇ 4 ) cos( ⁇ ) + z * sin( ⁇ )] where represents the output of each speaker signal.
  • FIG. 7 is a schematic structural diagram of a sound signal processing device according to an embodiment of the present invention.
  • the device includes:
  • a sound collector 701, configured to acquire a sound signal in the set area
  • An image collector 702 configured to acquire image information in a set area
  • the processor 703 is configured to obtain a coordinate position of the sound source signal in the three-dimensional polar coordinate according to the image information, and obtain a differential signal corresponding to each coordinate axis of the sound source signal according to the coordinate position of the sound source signal, to obtain position information of the speaker, according to the speaker.
  • the position information, the differential signal corresponding to each coordinate axis obtains the sound output signal of the speaker.
  • the processor 703 is specifically configured to obtain a differential signal on the X-axis and the Y-axis by using the following formula:
  • Gain coefficient and delay signal g iin L2 , S(t - T i2 ) are the gain coefficient and delay signal of another sound collection point on the X-axis or the Y-axis, respectively, and ⁇ refers to two adjacent sound collection points.
  • the ratio between the distance between the distance and the speed of sound transmission, ⁇ is the complex exponential sequence.
  • the processor 703 is specifically configured to obtain a differential signal on the paraxial axis by using the following formula:
  • z sqrt(l) 12 * ⁇ gain zl ⁇ S(t - zl ) - gain z2 * S(t - ⁇ ⁇ 2 )) g in zl , S - r zl ) are the gain coefficients and delay signals of a sound collection point on the Z axis, respectively, and gain z2 and S(Y_r z2 ) are the gain coefficients of another sound collection point on the z axis, respectively.
  • the delayed signal, ⁇ is the ratio between the distance between two adjacent sound collection points and the speed of sound transmission.
  • the processor 703 is further configured to obtain an amplitude value representing a signal strength of the sound source, and obtain an amplitude value of the sound source signal according to the amplitude value of the sound source signal and the coordinate position of the sound source signal in the preset three-dimensional polar coordinates. The amplitude value of the differential signal on each axis.
  • the processor 703 is further configured to determine position information of the speaker in the area (where, the angle between the projection of the line between the speaker and the coordinate origin on the horizontal plane and the X axis is a connection and a Z
  • the angle between the axes, the position information of the speaker and the differential signal on each coordinate axis are processed by the following formula to generate a sound output signal corresponding to the speaker.
  • embodiments of the present invention can be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment, or a combination of software and hardware. Moreover, the invention can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.
  • computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • These computer program instructions can also be stored in a bootable computer or other programmable data processing device.
  • a computer readable memory that operates in a particular manner, causing instructions stored in the computer readable memory to produce an article of manufacture comprising instruction means implemented in one or more flows and/or block diagrams of the flowchart The function specified in the box or in multiple boxes.
  • These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device.
  • the instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Stereophonic System (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

一种声音信号处理方法、装置及设备,该方法包括:获取设定区域内的声源信号以及设定区域的图像信息(S101),根据图像信息得到三维极坐标中声源信号的坐标位置(S102),根据声源信号的坐标位置,得到声源信号对应每一个坐标轴的差分信号(S103),获取扬声器的位置信息,根据扬声器的位置信息以及得到的每一个坐标轴的差分信号,得到扬声器的声音输出信号(S104),最后扬声器根据得到的输出声音信号进行输出。

Description

一种声音处理方法、 装置及设备 技术领域
本发明涉及通信技术领域, 尤其涉及一种声音信号处理方法、 装置及设 备。 背景技术
随着通信技术的发展, 用户不仅可以使用网络进行信息的传输, 还可以 使用网络进行语音交互或者是视频交互, 因此用户可以实现远程的通讯, 比 如说远程的视频教学或者是远程的视频会议等等。
当前, 采用 3D声音处理方式对音频会议中的音频流进行处理, 即通过为 各个音频流分配声像位置, 以及根据各个声像位置的音频流的位置关系, 调 节音频流在左右声道的增益大小, 进而营造出一种立体的声响效果。
当前的 3D 声音处理方式是通过简单的左右声道的增益调节来实现会场 3D声效果, 但是当前的 3D声效果都是通过固定扬声器播放音频, 因此当前 的 3D声效果只能实现单一效果, 降低了用户体验度。 发明内容
本发明实施例提供了一种声音信号处理方法、 装置及设备, 用以解决现 有技术中 3D声效果单一的问题。
其具体的技术方案如下:
本发明实施例第一发明提供了一种声音信号处理方法, 包括:
获取设定区域内的声源信号以及所述设定区域的图像信息;
根据所述图像信息得到三维极坐标中所述声源信号的坐标位置; 根据所述声源信号的坐标位置, 得到所述声源信号对应每一个坐标轴的 差分信号;
获取扬声器的位置信息, 根据所述扬声器的位置信息、 所述对应每一个 坐标轴的差分信号, 得到所述扬声器的声音输出信号。 结合第一方面, 在第一种可能的实现方式中, 在 X轴以及 Y轴上的差分 信号 L可以通过如下公式得到:
L = sqrt{2) 12 * (gain * S(t - τ ) - gainL2 * S{t― TL2 )) * K g inLl 、 ^t - T )分别为 x轴或者是 Y轴上的一个声音采集点的增益系 数以及延迟信号, gctinL2、 S(t - Ti2)分别为 X轴或者是 Y轴上的另一个声音 采集点的增益系数和延迟信号, r是指两个相邻声音采集点之间的距离与声 音传输速度之间的比值, K为复指数序列。
结合第一方面, 在第二种可能的实现方式中, 在 Z轴上的差分信号 z可 以通过如下公式得到:
z = sqrtil) 12 * {gainzl " S(t - Tzl) - gainz2 * S(t - τζ2 ))
g inzl , rzl )分别为 Ζ轴上的一个声音采集点的增益系数以及延迟 信号, gainz2、 5(ί _ τζ2)分别为 Z轴上的另一个声音采集点的增益系数和延 ϋ信号, τ是指两个相邻声音釆集点之间的距离与声音传输速度之间的比值。
结合第一方面, 在第三种可能的实现方式中, 在根据所述声源信号的坐 标位置, 得到所述声源信号对应每一个坐标轴的差分信号之后, 以及在获取 扬声器的位置信息, 根据所述扬声器的位置信息、 所述对应每一个坐标轴的 差分信号, 得到所述扬声器的声音输出信号之前, 还包括:
获取表征所述声源信号强度的幅度值;
根据所述声源信号的幅度值以及声源信号在所述预设三维极坐标中的坐 标位置, 得到所述声源信号的幅度值对应在每一个坐标轴上的差分信号的幅 度值。
结合第三种可能的实现方式, 在第四种可能的实现方式中, 通过如下公 式确定所述声源信号 (0的幅度值对应在每个坐标轴上的差分信号的幅度 值: x| =u*cos(<9)*cos( )卞 |
_y| =u*sin( )*cosO)卞 ( |
|z| =u*sin(^)* 1^( 1 其中, 为声源信号的幅度值, |为差分信号 J在 X轴上的幅度值,
^为差分信号 >在 Υ轴上的幅度值, |ζ|为差分信号 ζ在 Ζ轴上的幅度值, U 为幅度衰减系数, ø表示声源信号的坐标与所述三维极坐标的坐标原点之间的 连线在 ΧΟΥ平面上的投影与 X轴的夹角 , 表示所述连线与 ΧΟΥ平面之间 的夹角。
结合第一方面, 在第五种可能的实现方式中, 获取扬声器的位置信息, 根据所述扬声器的位置信息、 所述对应每一个坐标轴的差分信号, 得到所述 扬声器的声音输出信号, 包括:
确定所述扬声器在所处区域中的位置信息 ( , 其中, 为扬声器与 坐标原点之间的连线在水平面上的投影与 X轴的夹角, 为所述连线与 Z轴 之间的夹角;
通过如下公式将扬声器的位置信息与每个坐标轴上的差分信号进行处 理, 生成所述扬声器对应的声音输出信号:
m(k) = 0.5* [S(t) + x * cos((¾ ) cos( ) + y * sin((¾ ) cos(^ ) + sin(^ )] 其中, 表示各扬声器的输出信号。
本发明实施例第二方面提供了一种声音信号处理装置, 包括:
获取模块, 用于获取设定区域内的声源信号以及所述设定区域的图像信 息;
确定模块, 用于根据所述图像信息得到三维极坐标中所述声源信号的坐 标位置;
第一处理模块, 用于根据所述声源信号的坐标位置, 得到所述声源信号 对应每一个坐标轴的差分信号; 第二处理模块, 用于获取扬声器的位置信息, 根据所述扬声器的位置信 息、 所述对应每一个坐标轴的差分信号, 得到所述扬声器的声音输出信号。
结合第二方面, 在第一种可能的实现方式中, 第一处理模块, 还用于获 取表征所述声源信号强度的幅度值, 根据所述声源信号的幅度值以及声源信 号在所述预设三维极坐标中的坐标位置, 得到所述声源信号的幅度值对应在 每一个坐标轴上的差分信号的幅度值。
本发明实施例第三方面提供了一种声音信号处理设备, 包括:
声音采集器, 用于获取设定区域内的声音信号;
图像釆集器, 用于获取所述设定区域内的图像信息;
处理器, 用于根据所述图像信息得到三维极坐标中所述声源信号的坐标 位置, 根据所述声源信号的坐标位置, 得到所述声源信号对应每一个坐标轴 的差分信号, 获取扬声器的位置信息, 根据所述扬声器的位置信息、 所述对 应每一个坐标轴的差分信号, 得到所述扬声器的声音输出信号。
在本发明实施例中获取设定区域内的声源信号以及设定区域的图像信 息, 根据图像信息得到三维极坐标中声源信号的坐标位置, 根据声源信号的 坐标位置, 得到声源信号对应每一个坐标轴的差分信号, 获取扬声器的位置 信息, 根据扬声器的位置信息以及得到的每一个坐标轴的差分信号, 得到扬 声器的声音输出信号, 最后扬声器根据得到的输出声音信号进行输出。 这样 通过多个扬声器实现 3D音效果的同时, 还能通过 3D声效果反映出说话者的 发声位置以及声音的大小, 避免了现有技术中 3D声效果单一的问题。 附图说明
图 1 为本发明实施例中一种声音信号处理方法的流程图;
图 2 为本发明实施例中在第一设定区域内的坐标定位的示意图; 图 3为本发明实施例中声音信号采集的示意图;
图 4为本发明实施例中第二设定区域中扬声器设置位置的示意图; 图 5为本发明实施例中扬声器在第二设定区域中的坐标位置示意图; 图 6为本发明实施例中一种声音信号处理装置的结构示意图;
图 7为本发明实施例中一种声音信号处理设备的结构示意图。 具体实施方式
下面将结合本发明实施例中的附图, 对本发明实施例中的技术方案进行 清楚、 完整地描述, 显然, 所描述的实施例是本发明一部分实施例, 而不是 全部的实施例。 基于本发明中的实施例, 本领域普通技术人员在没有做出创 造性劳动前提下所获得的所有其他实施例, 都属于本发明保护的范围。
当前 3D音频技术的应用越来越广泛, 比如说在游戏、 电影、 会议中都可 以应用到 3D音频技术, 通过 3D音频技术可以让用户更加有真实感, 从而提 升用户在声音上的体验度, 尤其是在视频会议中, 应用 3D音频技术之后不仅 可以让用户有更加真实的感受, 并且还能够使说话者的声音信息更加具有辨 识度。
但是,在当前的 3D音频技术一般是通过两路声音信号的增益以及延迟调 节来完成, 这使得 3D音频技术的实现方式单一, 并且这种单一的实现方式使 得 3D音效的真实感较低。
针对上述问题, 本发明实施例提供了一种声音信号处理方法, 该方法包 括: 获取设定区域内的声源信号以及设定区域的图像信息, 根据图像信息得 到三维极坐标中声源信号的坐标位置, 根据所述声源信号的坐标位置, 得到 声源信号对应每一个坐标轴的差分信号, 获取扬声器的位置信息, 根据扬声 器的位置信息、 对应每一个坐标轴的差分信号, 得到扬声器的声音输出信号。
简单的来讲, 比如说说话者在第一间会议室说话, 而所有听众在第二间 会议室, 此时声音处理装置通过第一间会议室中的三维极坐标确定说话者在 第一间会议室说话的坐标位置, 并将说话者的声源信号分解为每个坐标轴上 的差分信号, 然后结合第二间会议室中扬声器所处的位置确定扬声器应该输 出的差分信号, 从而通过扬声器的组合播放, 使得用户可以通过扬声器输出 的声音信号判定出说话者在第一间会议室中的说话位置,这样就实现了 3D音 频技术的侵入式效果。
下面通过附图以及具体实施例对本发明技术方案做详细的说明, 应当理 解, 本发明实施例只是对本发明技术方案的详细说明而不是限定, 在不冲突 的情况下本发明实施例以及实施例中的具体技术特征可以相互组合。
实施例一:
如图 1 所示为本发明实施例中一种声音信号处理方法的流程图, 该方法 包括:
5101 , 获取设定区域内的声源信号以及设定区域的图像信息;
首先来讲, 在设定区域中设置有一声音处理设备, 在该声音处理设备中 至少包括了声音采集装置以及图像采集装置, 在本发明实施例中该声音采集 装置为全向麦克风, 用来来采集设定区域内的声源信号, 图像采集装置为全 向摄像头, 用来釆集设定区域内的图像信息。
5102 , 根据图像信息得到三维极坐标中声源信号的坐标位置;
当用户在设定区域内发出一声音信号时, 声音采集装置将采集到说话者 的声源信号 同时图像采集装置将采集设定区域中的图像信息, 然后基 于对图像的分析, 得到声源信号在设定区域内的三维极坐标中的坐标位置。
具体来讲, 在设定区域中的全向摄像头会拍摄一张设定区域中的全景图 像, 基于设定区域中的三维极坐标中的坐标原点确定出说话者在设定区域中 的坐标位置, 比如说如图 2所示, 在图 2 中说话者在设定区域中说话者的位 置坐标可以使用极坐标( r , θ , ) 来表示, 其中, r表示说话者与坐标原 点之间的距离, 0表示说话者的声源位置与坐标原点之间的连线在 XOY平面 上的投影与 X轴的夹角, 0表示 XOY平面上的投影与连线之间的夹角。 由于 声源信号 是由说话者发出, 因此说话者的位置确定出来的同时声源信号
^( 的位置就相应的确定出来。
5103 , 根据声源信号的坐标位置, 得到声源信号对应每一个坐标轴的差 分信号; 在得到声源信号的坐标位置之后, 声音处理设备会将声源信号 5(0分解 到三维极坐标的每个坐标轴上, 即: 得到 X、 Υ、 Ζ坐标轴方向上的差分信号, 具体来讲, X、 Υ、 Ζ坐标轴方向上的差分信号可以才艮据自由声波传输原理, 即: 当一个点波传送到两个距离很近的虚拟的全向麦克风的信号的原理来得 到各个坐标轴上的差分信号, 比如说如图 3所示, 在图 3中一个全向麦克分 可以虚拟为两个距离很近的全向麦克风, 通过分别获取声源信号对应在坐标 轴上的两个相邻采集点上的第一声音釆样信号以及第二声音采样信号, 就可 以根据坐标轴上的第一声音采样信号以及第二声音采样信号的差值得到声源 信号 在坐标轴上的差分信号。
具体来讲, X轴或者是 Y上的差分信号可以但是不限于如下的公式(1 ) 得到:
L = sqrt{2) 12 * (gain * S(t -τη) - gainL2 *S(t-TL2))*K ( l ) 当 L表征 X轴上的差分信号时: ^∞¾ * S(t - rri )表示 X轴上一个声音 采集点采集到的第一声音采样信号, gunxl、 分别为 X轴上的一个声 音釆集点的增益系数以及延迟信号, g^ 2*S(i-T2)表示 X轴上另一声音釆 集点采集到的第二声音采样信号, gainx2、 rx2)分别为 X轴上的另一个 声音采集点的增益系数和延迟信号, K为复指数序列 e 。
进一步, 由于声源信号 所处位置可以通过极坐标( r, θ , 表示, 因此在 X轴上的第一声音釆样信号以及第二声音釆样信号在第一预设坐标系 中的增益系数以及延迟时间可以通过声源信号 S(0中极坐标来得到, 具体可 以通过如下公式得到:
gainxl =(5/4 + cos(^)) cos
gainx2 = (5 / 4 - cos(< )) cos(^) τχ1 =r*{-0.5 + sqrt[(514 - cos(< )) cos ] } r2 =τ* {-0.5 + sqrt[{514 + cos(<9)) cos(^)] } 这里的 T是指两个虚拟全向麦克风的采集中心点之间的距离与声音传输 速度之间的比值, 即: 两个虚拟全向麦克风中间的声音延迟时间, 1表示声 源信号 传输到 X轴上的一个声音采集点的时间, ^2表示声源信号 传 输到 X轴上的另一个声音采集点的时间。
当 L表征 Y轴上的差分信号时: - ) 表示 Y轴上一个声音 采集点采集到的第一声音采样信号, gamyl、 分别为 Y轴上的一个 声音采集点的增益系数以及延迟信号, gainyl * S(t _ T 2)表示 Y轴上另一声音 釆集点釆集到的第二声音釆样信号, gnny2、 S( _^2)分别为 Y轴上的另一 个声音采集点的增益系数和延迟信号, K为复指数序列 ej 、。
进一步, 由于声源信号 所处位置可以通过极坐标( r, θ , 0 )表示, 因此在 Y轴上的第一声音采样信号以及第二声源采样信号中的增益系数以及 延迟时间可以通过声源信号 的极坐标来得到, 具体可以通过如下公式得 到:
g inyi = (5 / 4 + sin(^)) cos(^)
gainy2 = (5 / 4— sin(^)) cos(^)
Tyi =T* _o.5 + sqrt^ / 4 - si ) cos(^)] }
= r * {_0.5 + sqrt[{514 + sin(6>)) cos(^)] } 表示声源信号 S(0传输到 X轴上的一个声音釆集点的延迟时间, T yi 表示声源信号 传输到 X轴上的另一个声音采集点的时间。
对于 Z轴上的差分信号可以通过公式(2)得到, 具体如下:
z = sqrtil 12 * (gainzl * S(t -τζ1) - gainz2 * S(i - rz2》 ( 2 )
其中, g" zl*S( - 表示 Z轴上一个声音釆集点釆集到的第一声音釆 样信号, gai 、 rzl)分别为 Z轴上的一个声音采集点的增益系数以及 延迟信号, gainz2 * S( - rz2 )表示 z轴上另一声音采集点釆集到的第二声音采 样信号, gainz2、 S( _rz2)分别为 Z轴上的另一个声音采集点的增益系数和 延迟信号。
进一步, 由于声源信号 所处位置可以通过极坐标( r , θ , 表示, 因此在 Z轴上的第一声音釆样信号以及第二声源釆样信号中的增益系数以及 延迟时间可以通过声源信号 S(0中极坐标来得到, 具体可以通过如下公式得 到:
gainzl =5/4 + sin(^) gainz2 =5/4- sin(^) τζ1 =τ*[-0.5 + sqrt{514 - sin )] τζ1 = [-0.5 + sqrt{514 + sin )] 1表示声源信号 S(0传输到 X轴上的一个声音采集点的时间, 2表示 声源信号 传输到 X轴上的另一个声音采集点的时间。
通过上述的处理过程可以得到的声源信号在三维极坐标中每个坐标轴上 的差分信号, 在得到每个坐标轴上的差分信号之后, 需要根据声源信号的幅 度值来调整每个坐标轴上的差分信号的幅度值。
具体来讲, 首先获取声源信号的幅度值, 根据声源信号的幅度值以及声 源信号在三维极坐标中的坐标位置, 确定声源信号的幅度值对应在每个坐标 轴上的幅度值, 最后根据各个坐标轴上的幅度值, 调整各个坐标轴的差分信 号的幅度值, 具体的调整方式可以但是不限于通过如下的公式来得到:
=u*cos(<9)*cosO) (0
Figure imgf000011_0001
z =u*sin(» () 其中, S )为声源信号的幅度值, x为差分信号 在 X轴上的幅度值,
: Η为差分信号 在 γ轴上的幅度值, |ζ|为差分信号 ζ在 ζ轴上的幅度值, u 为幅度衰减系数。
在得到 X、 Υ、 Ζ坐标轴上的幅度值之后, 根据得到的幅度值来调整各个 坐标轴上对应差分信号的幅度值, 即: X轴上的差分信号的幅度值调整为 X γ轴上的差分信号的幅度值调整为 ,z轴上的差分信号的幅度值调整为 Ζ 在得到各个坐标轴上调整幅度之后的差分信号时, 该声音处理设备将执 行步骤 S104。
S104, 获取扬声器的位置信息, 根据扬声器的位置信息、 对应每一个坐 标轴的差分信号, 得到扬声器的声音输出信号;
在得到每个坐标轴上的差分信号之后, 需要基于扬声器的位置信息确定 扬声器应该对应的声音输出信号。
具体来讲, 在本发明实施例中扬声器所处的区域中也存在三维极坐标, 此时扬声器的位置坐标也可以通过极坐标来表征, 比如说如图 4所示, 在图 4 中设置的是三维极坐标, 即包括 X轴、 Y轴以及 Z轴, 扬声器在三维极坐标 中的位置应该是图 5所示的位置关系, 在图 5中, 扬声器处于三维极坐标中, 此时扬声器的位置可以通过极坐标( r, Θ, , ^ )来表征, 其中, r用来表征 扬声器与极坐标系中坐标原点之间的距离, 表示扬声器与坐标原点之间的 连线在 XOY平面上的投影线与 X轴之间的夹角, 投影线与所述连线之间的 夹角, 图 5中的每个扬声器都可以通过极坐标来表征。
基于扬声器的极坐标, 通过公式 (3 )得到扬声器的声音输出信号: m{k) = 0.5* [S(t) + cos(<¾ ) cos( ) + y sin( ) cos(^ ) + z sin( )] (3 ) 其中, 表示各扬声器的输出信号。
比如说第一扬声器的极坐标为 ( r , , 则该第一扬声器的输出信 号就是: = 0.5 * [S(t) + x * cos(/¾ ) cos(^ ) + y sin(6> ) cos(¾ ) + z sin(^ )] , 当然, 若是区域中存在多个扬声器时, 可以通过公式(3 )得到每个扬声器对应的声 音输出信号, 此处就不再赘述。
在得到扬声器的输出信号之后, 为了保证扬声器输出信号的效果, 根据 说话者的方位和扬声器的布局, 选择并计算扬声器的增益, 并得到增益调整 后的输出信号, 其增益调整后的输出信号如下:
m(k) = 0.5 * gain * [S(t) + x * cos ( ) cos(¾ ) + y sin( ) cos(<¾ ) + z sin(^¾ )] 在得到每个扬声器增益调整后的输出信号之后, 此时就将每个扬声器的 声音输出信号发送至对应的扬声器, 从而该扬声器就按照得到的输出信号进 行输出。
通过不同位置的扬声器输出的不同的声音输出信号, 就可以在区域中形 成 3D声效, 并且能够比较精确的还原声源信号, 从而让听者能够感受到声源 信号的位置, 以及感受到声源的位置改变以及声源信号的强度改变, 进而实 现了对声源信号在各个方位上的立体化效果, 提升了 3D声效的呈现效果。
另外, 在本发明实施例中当说话者的位置发生改变, 也就是声源位置发 生改变时, 则扬声器的声音输出信号也同时改变, 这样就可以通过扬声器输 出信号随着声源位置的改变而改变来反映出声源信号的位置改变, 进而在提 升了 3D声效的呈现效果的基础上,听者还可以通过扬声器输出的声音输出信 号随时感受到说话者在第一设定区域中的发声位置, 实现了 3D侵入式声效, 提升了用户体验。
另外, 在上述实施例中在设定区域中是通过全向麦克风将声源信号分别 分解到 3个坐标轴上的差分信号, 当然在实际的场景中一般的麦克风只能是 釆集两路信号, 因此在本发明实施例中还可以将声源信号分解到两个坐标轴 上的差分信号, 并通过这两个差分信号得到最后发送至扬声器的声音输出信 号, 具体的实现方式如下:
首先来讲, 说话者还是处于设定区域中, 并且在设定区域中存在三维极 坐标, 此时声源信号的位置可以通过三维极坐标来表示, 即: ( r, θ, φ ), 根据声源信号的极坐标就可以得到声源信号在 X轴上的差分信号, 其 X轴的 声音信号釆集方式与上述实施例中的釆集方式相同, 即: 将一个麦克风虚拟 为两个相邻的麦克风采集点, 从而得到两次采集的声音信号, 通过两次采集 的声音信号得到 X轴上的差分信号为:
X = sqrtil) 12 * (gainxl ^S(t-Txl)- gainx2 "S(t-Txl - τχ,》 其中, ¾表征声源信号在 χ轴上的第一声音采集点增益, S -D表 征声源信号在 X轴上的第一声音采集点的延迟信号, g« 2表征声源信号在 X 轴上的第二声音采集点增益, S(t-Tx2_ 3)表征声源信号在 X轴上的第二声音 采集点的延迟信号。
进一步,由于声源信号 所处位置可以通过三维坐标系中的极坐标( Θ )表征, 因此 X轴上的增益以及延迟时间可以通过声源信号的极坐标来获 取, 其具体的可以通过如下公式获取:
gainxl =5/4 + cos
gainxl =5/4- cos(^)
Τ 1 =r*[-0.5 + ^r (5/4-cos(< ))]
Tx2 = T * [-0.5 + sqrt(514 + cos((9))]
这里的 τ是指两个虚拟全向麦克风的采集点之间的距离与声音传输速度 之间的比值, 即: 两个虚拟全向麦克风中间的声音延迟时间, rxl表示声源信 号 传输到 X轴上的一个声音采集点的时间, Tx2表示声源信号 传输 到 X轴上的另一个声音采集点的时间。
这样得到的 X轴是的差分信号指向为心形。
基于获取 X轴上差分信号的原理, 同样可以通过如下公式获取到 Y轴上 的差分信号:
y = W 2) 12 * (gainyl * S(t -τγ1) - gainy2 * S(t - Ty2 )) 其中, ^ ^表征声源信号在 x轴上的第一声音采集点增益, >S(i - )表 征声源信号在 γ轴上的第一声音釆集点的延迟信号, giw^2表征声源信号在
Y轴上的第二声音采集点增益, ry2)表征声源信号在 γ轴上的第二声音 采集点的延迟信号。
进一步,由于声源信号 所处位置可以通过三维坐标系中的极坐标( r, Θ )表征, 因此 Y轴上的增益以及延迟时间可以通过声源信号的极坐标来获 取, 其具体的可以通过如下公式获取:
g inyl =5/4 + sin(^)
gainy2 - 51 A- sin(^)
Tyl = r * [-0.5 + S^ri(5 / 4 - sin(^))]
Tj2 = r * [_0·5 + sqrt{514 + sin(^))] 表示声源信号 S(0传输到 Y轴上的一个声音采集点的时间, rw表示 声源信号 传输到 γ轴上的另一个声音采集点的时间。
这样得到的 Y轴是的差分信号指向为 8字形
由于通过上述方法得到的基于三维坐标系得到的两路差分信号, 因此得 到的两个差分信号提供给另一区域中的两个扬声器输出, 在另一区域中的两 个扬声器的输出信号可以通过如下公式得到:
1 = 0.5^(^ + 3;)
R = .5^(x-y)
其中, 表征相对于坐标原点的左边的扬声器的输出信号, 当然;?表征相 对于坐标原点的右边扬声器的输出信号。
当然, 得到两个扬声器中每个扬声器的输出信号之后, 还需要对得到的 输出信号进行增益调整, 此处的增益调整可以才艮据实际的应用场景来调整, 即: 增益可调高也可以调低。 在调整好输出信号的增益之后, 得到增益调整 后的输出信号:
L = 0.5 * gain ^ (x + y)
R = 0.5 * gain * (x - y)
将最终的输出信号发送至对应的扬声器,具体来讲就是将 £输出信号发送 至相对于坐标原点坐标的左边的扬声器, 而 7?输出信号发送至相对于坐标原 点的右边的扬声器。 最后通过左右两边的扬声器的输出信号在第二设定区域 中混合形成 3D音效, 增加了 3D声效的实现方式。
并且在本发明实施例中在说话者说话位置的改变, 即: 声源位置的改变, 此时每个扬声器输出信号也会同样的改变的情况下 , 扬声器输出信号也会相 应的调整, 然后通过扬声器输出调整后的输出信号来形成不同的 3D音效, 从 而使得听者可以感受到声音信号位置的改变, 提升了用户的使用体验。
实施例二:
对应本发明实施例一中的一种声音信号处理方法, 如图 6所示, 本发明 实施例还提供了一种声音信号处理装置, 该装置包括:
获取模块 601 , 用于获取设定区域内的声源信号以及设定区域的图像信 息;
确定模块 602, 用于根据图像信息得到三维极坐标中声源信号的坐标位 置;
第一处理模块 603 , 用于根据声源信号的坐标位置, 得到声源信号对应每 一个坐标轴的差分信号;
第二处理模块 604,用于获取扬声器的位置信息,根据扬声器的位置信息、 对应每一个坐标轴的差分信号, 得到扬声器的声音输出信号。
进一步, 第一处理模块 603 , 还用于获取表征声源信号强度的幅度值, 根 据声源信号的幅度值以及声源信号在预设三维极坐标中的坐标位置, 得到声 源信号的幅度值对应在每一个坐标轴上的差分信号的幅度值。
第二处理模块 604, 具体用于确定扬声器在所处区域中的位置信息
{θ,, φ , 其中, 为扬声器与坐标原点之间的连线在水平面上的投影与 X轴 的夹角, 为连线与 Ζ轴之间的夹角, 通过如下公式将扬声器的位置信息与 每个坐标轴上的差分信号进行处理, 生成扬声器对应的声音输出信号:
m{k) = 0.5 * [S(t) + x * cos(<¾ ) cos( ) + _y * sin(<¾ ) cos(^ ) + z * sin(^ )]其 中, 表示各扬声器的输出信号。
另外, 本发明实施例中还提供了一种声音信号处理设备, 如图 7所示为 本发明实施例中一种声音信号处理设备的结构示意图, 该设备包括:
声音采集器 701 , 用于获取设定区域内的声音信号;
图像采集器 702, 用于获取设定区域内的图像信息;
处理器 703 , 用于根据图像信息得到三维极坐标中声源信号的坐标位置, 根据声源信号的坐标位置, 得到声源信号对应每一个坐标轴的差分信号, 获 取扬声器的位置信息, 根据扬声器的位置信息、 对应每一个坐标轴的差分信 号, 得到扬声器的声音输出信号。
进一步, 处理器 703, 具体用于通过如下公式得到 X轴以及 Y轴上的差 分信号:
L = sqrt(2) 12 * (gainn * S(t - τη ) - gainL2 * S(t― TL2 )) * K gainLl、 分别为 x轴或者是 Y轴上的一个声音采集点的增益系 数以及延迟信号, g iinL2、 S(t - Ti2)分别为 X轴或者是 Y轴上的另一个声音 采集点的增益系数和延迟信号, τ是指两个相邻声音采集点之间的距离与声 音传输速度之间的比值, Κ为复指数序列。
处理器 703 , 具体用于通过如下公式得到 Ζ轴上的差分信号:
z = sqrt(l) 12 * {gainzl ^ S(t - zl) - gainz2 * S(t - τζ2 )) g inzl、 S - rzl)分别为 Z轴上的一个声音采集点的增益系数以及延迟 信号, gainz2、 S(Y_rz2)分别为 z轴上的另一个声音釆集点的增益系数和延 迟信号, τ是指两个相邻声音采集点之间的距离与声音传输速度之间的比值。
进一步, 处理器 703 , 还用于获取表征声源信号强度的幅度值, 根据声源 信号的幅度值以及声源信号在预设三维极坐标中的坐标位置, 得到声源信号 的幅度值对应在每一个坐标轴上的差分信号的幅度值。
进一步,处理器 703 ,还用于确定扬声器在所处区域中的位置信息 ( , 其中, 为扬声器与坐标原点之间的连线在水平面上的投影与 X轴的夹角, 为连线与 Z轴之间的夹角, 通过如下公式将扬声器的位置信息与每个坐标轴 上的差分信号进行处理, 生成扬声器对应的声音输出信
号: ( ) = 0.5* [S{t) + cos(<¾ ) cos(^ ) + sin( ) cos(^ ) + sin(^ )]其中, (k)表示各扬声器的输出信号。
本领域内的技术人员应明白, 本发明的实施例可提供为方法、 系统、 或 计算机程序产品。 因此, 本发明可釆用完全硬件实施例、 完全软件实施例、 或结合软件和硬件方面的实施例的形式。 而且, 本发明可采用在一个或多个 其中包含有计算机可用程序代码的计算机可用存储介质 (包括但不限于磁盘 存储器、 CD-ROM、 光学存储器等) 上实施的计算机程序产品的形式。
本发明是参照根据本发明实施例的方法、 设备(系统)、 和计算机程序产 品的流程图和 /或方框图来描述的。 应理解可由计算机程序指令实现流程图 和 /或方框图中的每一流程和 /或方框、 以及流程图和 /或方框图中的流程 和 /或方框的结合。 可提供这些计算机程序指令到通用计算机、 专用计算机、 嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器, 使得通 过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流 程图一个流程或多个流程和 /或方框图一个方框或多个方框中指定的功能的 装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设 备以特定方式工作的计算机可读存储器中, 使得存储在该计算机可读存储器 中的指令产生包括指令装置的制造品, 该指令装置实现在流程图一个流程或 多个流程和 /或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上, 使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的 处理, 从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图 一个流程或多个流程和 /或方框图一个方框或多个方框中指定的功能的步 骤。
尽管已描述了本发明的优选实施例, 但本领域内的技术人员一旦得知了 基本创造性概念, 则可对这些实施例作出另外的变更和修改。 所以, 所附权 利要求意欲解释为包括优选实施例以及落入本发明范围的所有变更和修改。
显然, 本领域的技术人员可以对本发明实施例进行各种改动和变型而不 脱离本发明实施例的精神和范围。 这样, 倘若本发明实施例的这些修改和变 型属于本发明权利要求及其等同技术的范围之内, 则本发明也意图包含这些 改动和变型在内。

Claims

权 利 要 求
1、 一种声音信号处理方法, 其特征在于, 包括:
获取设定区域内的声源信号以及所述设定区域的图像信息;
根据所述图像信息得到三维极坐标中所述声源信号的坐标位置; 根据所述声源信号的坐标位置, 得到所述声源信号对应每一个坐标轴的 差分信号;
获取扬声器的位置信息, 根据所述扬声器的位置信息、 所述对应每一个 坐标轴的差分信号, 得到所述扬声器的声音输出信号。
2、 如权利要求 1所述的方法, 其特征在于, 在 X轴以及 Y轴上的差分 信号 L可以通过如下公式得到:
L = sqrt(2) 12 * (gainn * S(t - T ) - gainL2 * S(t - TL2)) * K gainLl、 分别为 x轴或者是 Y轴上的一个声音采集点的增益系 数以及延迟信号, gamL2、 S(t 2)分别为 X轴或者是 Y轴上的另一个声音 采集点的增益系数和延迟信号, τ是指两个相邻声音采集点之间的距离与声 音传输速度之间的比值, Κ为复指数序列。
3、 如权利要求 1所述的方法, 其特征在于, 在 Ζ轴上的差分信号 ζ可以 通过如下公式得到:
z = sqrt{2) 12 * {gainzl ^ S{t - Tzl) - gainz2 * S(t - τζ2 )) g inzl , rzl )分别为 Ζ轴上的一个声音釆集点的增益系数以及延迟 信号, gainzl、 S( _ rz2)分别为 z轴上的另一个声音釆集点的增益系数和延 迟信号, τ是指两个相邻声音釆集点之间的距离与声音传输速度之间的比值。
4、 如权利要求 1所述的方法, 其特征在于, 在根据所述声源信号的坐标 位置, 得到所述声源信号对应每一个坐标轴的差分信号之后, 以及在获取扬 声器的位置信息, 根据所述扬声器的位置信息、 所述对应每一个坐标轴的差 分信号, 得到所述扬声器的声音输出信号之前, 还包括: 获取表征所述声源信号强度的幅度值;
根据所述声源信号的幅度值以及声源信号在所述预设三维极坐标中的坐 标位置, 得到所述声源信号的幅度值对应在每一个坐标轴上的差分信号的幅 度值。
5、 如权利要求 4所述的方法, 其特征在于, 通过如下公式确定所述声源 信号 5(0的幅度值对应在每个坐标轴上的差分信号的幅度值:
|x| =u*cos( )*cos(»卞 (0|
| ^| =u*sin(6')*cos(^)* 1^( 1
|z| =u*sin(^)*|5( |
其中, |So)|为声源信号的幅度值, |^|为差分信号 在 X轴上的幅度值,
W为差分信号 在 γ轴上的幅度值, |z|为差分信号 z在 z轴上的幅度值, u 为幅度衰减系数, ø表示声源信号的坐标与所述三维极坐标的坐标原点之间的 连线在 XOY平面上的投影与 X轴的夹角 , ^表示所述连线与 XOY平面之间 的夹角。
6、 如权利要求 1所述的方法, 其特征在于, 获取扬声器的位置信息, 根 据所述扬声器的位置信息、 所述对应每一个坐标轴的差分信号, 得到所述扬 声器的声音输出信号, 包括:
确定所述扬声器在所处区域中的位置信息 ( , 其中, 为扬声器与 坐标原点之间的连线在水平面上的投影与 X轴的夹角, 为所述连线与 Z轴 之间的夹角;
通过如下公式将扬声器的位置信息与每个坐标轴上的差分信号进行处 理, 生成所述扬声器对应的声音输出信号:
m(k) = 0.5* [S(t) + cos(<¾ ) cos(¾ ) + _y * sin(/¾ ) cos(^ ) + z * sin(<pk )] 其中, ( :)表示各扬声器的输出信号。
7、 一种声音信号处理装置, 其特征在于, 包括: 获取模块, 用于获取设定区域内的声源信号以及所述设定区域的图像信 息;
确定模块, 用于根据所述图像信息得到三维极坐标中所述声源信号的坐 标位置;
第一处理模块, 用于根据所述声源信号的坐标位置, 得到所述声源信号 对应每一个坐标轴的差分信号;
第二处理模块, 用于获取扬声器的位置信息, 根据所述扬声器的位置信 息、 所述对应每一个坐标轴的差分信号, 得到所述扬声器的声音输出信号。
8如权利要求 7所述的装置, 其特征在于, 第一处理模块, 还用于获取表 征所述声源信号强度的幅度值, 根据所述声源信号的幅度值以及声源信号在 所述预设三维极坐标中的坐标位置, 得到所述声源信号的幅度值对应在每一 个坐标轴上的差分信号的幅度值。
9、 一种声音信号处理设备, 其特征在于, 包括:
声音采集器, 用于获取设定区域内的声音信号;
图像采集器, 用于获取所述设定区域内的图像信息;
处理器, 用于根据所述图像信息得到三维极坐标中所述声源信号的坐标 位置, 根据所述声源信号的坐标位置, 得到所述声源信号对应每一个坐标轴 的差分信号, 获取扬声器的位置信息, 根据所述扬声器的位置信息、 所述对 应每一个坐标轴的差分信号, 得到所述扬声器的声音输出信号。
PCT/CN2014/081511 2013-12-20 2014-07-02 一种声音处理方法、装置及设备 WO2015090039A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201310714608.6A CN104735582B (zh) 2013-12-20 2013-12-20 一种声音信号处理方法、装置及设备
CN201310714608.6 2013-12-20

Publications (1)

Publication Number Publication Date
WO2015090039A1 true WO2015090039A1 (zh) 2015-06-25

Family

ID=53402054

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2014/081511 WO2015090039A1 (zh) 2013-12-20 2014-07-02 一种声音处理方法、装置及设备

Country Status (2)

Country Link
CN (1) CN104735582B (zh)
WO (1) WO2015090039A1 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105879390A (zh) * 2016-04-26 2016-08-24 乐视控股(北京)有限公司 虚拟现实游戏处理方法及设备
CN109474881B (zh) * 2018-01-22 2020-10-16 国网浙江桐乡市供电有限公司 一种三维实景配现场音的方法及系统
CN114615534A (zh) * 2022-01-27 2022-06-10 海信视像科技股份有限公司 显示设备及音频处理方法

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1998001197A1 (en) * 1996-07-04 1998-01-15 Central Research Laboratories Limited Sound effect mechanism
JPH10191498A (ja) * 1996-12-27 1998-07-21 Matsushita Electric Ind Co Ltd 音信号処理装置
JP2003023699A (ja) * 2001-07-05 2003-01-24 Saibuaasu:Kk 空間情報聴覚化装置および空間情報聴覚化方法
JP2003348700A (ja) * 2002-05-28 2003-12-05 Victor Co Of Japan Ltd 臨場感信号の生成方法、及び臨場感信号生成装置
CN101330585A (zh) * 2007-06-20 2008-12-24 深圳Tcl新技术有限公司 一种声音定位的方法及系统
CN101459797A (zh) * 2007-12-14 2009-06-17 深圳Tcl新技术有限公司 一种声音定位的方法及系统
CN103118322A (zh) * 2012-12-27 2013-05-22 新奥特(北京)视频技术有限公司 一种环绕声声像处理系统

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100556151C (zh) * 2006-12-30 2009-10-28 华为技术有限公司 一种视频终端以及一种音频码流处理方法
US8509454B2 (en) * 2007-11-01 2013-08-13 Nokia Corporation Focusing on a portion of an audio scene for an audio signal
CN101350931B (zh) * 2008-08-27 2011-09-14 华为终端有限公司 音频信号的生成、播放方法及装置、处理系统
US8111843B2 (en) * 2008-11-11 2012-02-07 Motorola Solutions, Inc. Compensation for nonuniform delayed group communications
CN203151672U (zh) * 2013-03-21 2013-08-21 徐华中 一种具有声源定位功能的视频系统

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1998001197A1 (en) * 1996-07-04 1998-01-15 Central Research Laboratories Limited Sound effect mechanism
JPH10191498A (ja) * 1996-12-27 1998-07-21 Matsushita Electric Ind Co Ltd 音信号処理装置
JP2003023699A (ja) * 2001-07-05 2003-01-24 Saibuaasu:Kk 空間情報聴覚化装置および空間情報聴覚化方法
JP2003348700A (ja) * 2002-05-28 2003-12-05 Victor Co Of Japan Ltd 臨場感信号の生成方法、及び臨場感信号生成装置
CN101330585A (zh) * 2007-06-20 2008-12-24 深圳Tcl新技术有限公司 一种声音定位的方法及系统
CN101459797A (zh) * 2007-12-14 2009-06-17 深圳Tcl新技术有限公司 一种声音定位的方法及系统
CN103118322A (zh) * 2012-12-27 2013-05-22 新奥特(北京)视频技术有限公司 一种环绕声声像处理系统

Also Published As

Publication number Publication date
CN104735582A (zh) 2015-06-24
CN104735582B (zh) 2018-09-07

Similar Documents

Publication Publication Date Title
US11991315B2 (en) Audio conferencing using a distributed array of smartphones
CN111466124B (zh) 用于渲染用户的视听记录的方法,处理器系统和计算机可读介质
RU2586842C2 (ru) Устройство и способ преобразования первого параметрического пространственного аудиосигнала во второй параметрический пространственный аудиосигнал
KR101724514B1 (ko) 사운드 신호 처리 방법 및 장치
JP5603325B2 (ja) マイクロホン配列からのサラウンド・サウンド生成
CN106664501B (zh) 基于所通知的空间滤波的一致声学场景再现的系统、装置和方法
TW201820898A (zh) 用以再生空間分散聲音之方法
AU2017210021B2 (en) Synthesis of signals for immersive audio playback
KR20200040745A (ko) 다중-지점 음장 묘사를 이용하여 증강된 음장 묘사 또는 수정된 음장 묘사를 생성하기 위한 개념
US20150189455A1 (en) Transformation of multiple sound fields to generate a transformed reproduced sound field including modified reproductions of the multiple sound fields
WO2010022633A1 (zh) 音频信号的生成、播放方法及装置、处理系统
WO2015039439A1 (zh) 音频信号处理方法及装置、差分波束形成方法及装置
WO2015035785A1 (zh) 语音信号处理方法与装置
JP2009508442A (ja) オーディオ処理のためのシステムおよび方法
EP2974253A1 (en) Normalization of soundfield orientations based on auditory scene analysis
US11140507B2 (en) Rendering of spatial audio content
WO2010022658A1 (zh) 多视点媒体内容的发送和播放方法、装置及系统
Lee et al. A real-time audio system for adjusting the sweet spot to the listener's position
WO2015090039A1 (zh) 一种声音处理方法、装置及设备
Sun Immersive audio, capture, transport, and rendering: a review
Comminiello et al. Intelligent acoustic interfaces with multisensor acquisition for immersive reproduction
US11252528B2 (en) Low-frequency interchannel coherence control
CN113347530A (zh) 一种用于全景相机的全景音频处理方法
KR101111734B1 (ko) 복수 개의 음원을 구분하여 음향을 출력하는 방법 및 장치
US11589184B1 (en) Differential spatial rendering of audio sources

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14872687

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14872687

Country of ref document: EP

Kind code of ref document: A1