CN102972047B - Method and apparatus for reproducing stereophonic sound - Google Patents

Method and apparatus for reproducing stereophonic sound Download PDF

Info

Publication number
CN102972047B
CN102972047B CN201180033247.8A CN201180033247A CN102972047B CN 102972047 B CN102972047 B CN 102972047B CN 201180033247 A CN201180033247 A CN 201180033247A CN 102972047 B CN102972047 B CN 102972047B
Authority
CN
China
Prior art keywords
sound
power
depth information
signal
target voice
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201180033247.8A
Other languages
Chinese (zh)
Other versions
CN102972047A (en
Inventor
金善民
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Samsung Electronics Co Ltd
Original Assignee
Samsung Electronics Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Samsung Electronics Co Ltd filed Critical Samsung Electronics Co Ltd
Publication of CN102972047A publication Critical patent/CN102972047A/en
Application granted granted Critical
Publication of CN102972047B publication Critical patent/CN102972047B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S5/00Pseudo-stereo systems, e.g. in which additional channel signals are derived from monophonic signals by means of phase shifting, time delay or reverberation 
    • H04S5/02Pseudo-stereo systems, e.g. in which additional channel signals are derived from monophonic signals by means of phase shifting, time delay or reverberation  of the pseudo four-channel type, e.g. in which rear channel signals are derived from two-channel stereo signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S5/00Pseudo-stereo systems, e.g. in which additional channel signals are derived from monophonic signals by means of phase shifting, time delay or reverberation 
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S1/00Two-channel systems
    • H04S1/002Non-adaptive circuits, e.g. manually adjustable or static, for enhancing the sound image or the spatial distribution
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S3/00Systems employing more than two channels, e.g. quadraphonic
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R5/00Stereophonic arrangements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2400/00Details of stereophonic systems covered by H04S but not provided for in its groups
    • H04S2400/11Positioning of individual sound objects, e.g. moving airplane, within a sound field
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2400/00Details of stereophonic systems covered by H04S but not provided for in its groups
    • H04S2400/15Aspects of sound capture and related signal processing for recording or reproduction

Abstract

Method and apparatus reproduce a stereophonic sound. The method includes obtaining sound depth information which denotes a distance between at least one object within a sound signal and a reference position, and providing sound perspective to the sound object output from a speaker, based on the sound depth information.

Description

For reproduction of stereo sound method and apparatus
Technical field
The equipment consistent with exemplary embodiment and method relate to reproduction of stereo, more particularly, relate to rendering stereoscopic sense and are endowed the stereo of target voice.
Background technology
Along with the development of video technique, user can watch three-dimensional (3D) stereo-picture now.Such as, by using various method (such as, binocular parallax method), 3D stereo-picture makes left viewpoint image data be shown to left eye, and right viewpoint image data is shown to right eye.Therefore use 3D video technique, user can know from experience the object be advanced to outside screen or the object turned back in screen realistically.
On the other hand, sterophonic technique is by making user can experience position and the existence of sound in the cycle user multiple loudspeaker arrangement.But adopt the sterophonic technique of prior art, can not effectively be expressed to close to the image object of user or the sound that leaves the image object of user relevant, therefore corresponding with stereo-picture sound effect can not be provided.
Summary of the invention
Technical scheme
Exemplary embodiment can at least overcome the above problems and/or shortcoming and above other shortcomings not do not described.In addition, exemplary embodiment does not need to overcome above-mentioned shortcoming, and exemplary embodiment can not overcome above-mentioned any problem.
One or more exemplary embodiment is provided for reproduction of stereo sound method and apparatus effectively, more particularly, relate to for by third dimension is given target voice effectively express close to user sound or leave the method and apparatus of sound of user.
The beneficial effect of the invention
According to prior art, because the depth information of image object will be provided as additional information or need obtained by analysis of image data due to the depth information of image object, be therefore difficult to obtain depth information.But according to exemplary embodiment, the information based on the position about image object can be included in the fact in voice signal, produce depth information by analyzing voice signal.Therefore, the depth information of image object can easily be obtained.
In addition, according to prior art, use voice signal to express such phenomenon inadequately, such as image object turns back in screen from screen reach or image object.But according to exemplary embodiment, by expressing the target voice because image object stretches out screen or returns screen and produce, user can experience stereophonic effect more true to nature.
In addition, according to exemplary embodiment, the distance produced between the position of target voice and reference position can be expressed effectively.Specifically, because third dimension is endowed each target voice, therefore user can experience stereophonic effect effectively.
Exemplary embodiment can be implemented as computer program and be implemented in general purpose digital computer, and this general purpose digital computer uses computer readable recording medium storing program for performing to carry out executive program.
The example of computer readable recording medium storing program for performing comprises storage medium, such as, and such as magnetic storage medium (such as, ROM, floppy disk, hard disk etc.) and optical recording media (such as, CD-ROM or DVD).
Foregoing example embodiment and advantage are only exemplary and are not interpreted as restriction.This instruction can be readily applied to the equipment of other types.In addition, the description of exemplary embodiment is intended to illustrate, and is not meant to the scope of restriction claim, and many replacements, amendment and distortion will be obvious for those skilled in the art.
Accompanying drawing explanation
By referring to accompanying drawing, certain exemplary embodiments is described, above and/or other aspects will become more obvious, wherein:
Fig. 1 is the block diagram of the stereophonics equipment illustrated according to exemplary embodiment;
Fig. 2 illustrates that the sound depth information according to exemplary embodiment obtains the block diagram of unit;
Fig. 3 illustrates the block diagram being provided stereosonic stereophonics equipment by use two channel sound signals according to exemplary embodiment;
Fig. 4 A, 4B, 4C and 4D illustrate and provide stereosonic example according to exemplary embodiment;
Fig. 5 illustrates the flow chart producing the method for sound depth information based on voice signal according to exemplary embodiment;
Fig. 6 A, 6B, 6C and 6D illustrate the example producing sound depth information from voice signal according to exemplary embodiment;
Fig. 7 illustrates the flow chart of the reproduction of stereo sound method according to exemplary embodiment.
Preferred forms
According to the one side of exemplary embodiment, provide a kind of reproduction of stereo sound method, described method comprises: obtain sound depth information, sound depth information represents the distance between at least one target voice and reference position in voice signal; Target voice is given by sound perspective based on sound depth information.
Voice signal can be divided into multiple adjacent part, and the step obtaining sound depth information comprises: obtain sound depth information by the voice signal in preceding section and the voice signal in current portions being compared.
The step obtaining sound depth information can comprise: the power calculating each frequency band of each of preceding section and current portions; Power based on each frequency band will have the power of predetermined value or greater value and the frequency band total with adjacent part is defined as common band; Based on the common band in the power of the common band in current portions and preceding section power between difference obtain sound depth information.
Described method can also comprise: obtain center channel signal, center channel signal outputs to central loudspeakers from voice signal, and wherein, the step of rated output comprises: the power calculating each frequency band based on center channel signal.
The step of giving sound perspective can comprise: the power adjusting target voice based on sound depth information.
The step of giving sound perspective can comprise: the gain and the time of delay that adjust the reflected signal produced due to reflect sound object based on sound depth information.
The step of giving sound perspective can comprise: the size adjusting the low strap component of target voice based on sound depth information.
Give the step of sound perspective can comprise: adjust by the phase place of the target voice exported from the first loud speaker and by the phase difference between the phase place of target voice that exports from the second loud speaker.
Described method can also comprise: use left side circulating loudspeaker or right side circulating loudspeaker or use loud speaker before the front loud speaker in left side and right side, export and impart relief target voice.
Described method can also comprise: the outside by use voice signal, sound field being positioned at loud speaker.
According to the another aspect of exemplary embodiment, provide a kind of stereophonics equipment, comprising: information obtainment unit, obtain sound depth information, sound depth information represents the distance between at least one target voice and reference position in voice signal; Third dimension providing unit, gives target voice based on sound depth information by sound perspective.
Embodiment
Referring to accompanying drawing, certain exemplary embodiments is described below in greater detail.
In the following description, even if in different drawings, identical drawing reference numeral is used for identical element.The content (such as detailed structure and element) defined in the description is provided to help the complete understanding to exemplary embodiment.But, when the content not having those specifically to limit, exemplifying embodiment embodiment.
First, for convenience of description, the term used in the exemplary embodiment is described.
Target voice refers to each sound element be included in voice signal.In voice signal, various target voice can be included.Such as, in the voice signal produced at the actual scene played by record orchestra, the various target voices produced from various Musical Instrument (such as, guitar, violin, oboe etc.) are included.
Sound source refers to the object producing target voice, such as Musical Instrument or sound.In the exemplary embodiment, the object and the user that produce target voice think that the object producing target voice is called as sound source.Such as, if apple flies to user from screen when user is watching film, then the sound (target voice) that the apple flown produces is included in voice signal.Target voice can be actual sound by record generation when apple is thrown away and the sound that produces, or can be the playback sound of target voice of precedence record.But under any circumstance, user perceives apple and creates target voice, thus apple is also considered to the sound source that defines in the exemplary embodiment.
Sound depth information is the distance represented between target voice and reference position.In detail, sound depth information refers to the distance between position (position of sound source) and reference position producing target voice.
In the examples described above, if apple just flies to user from screen when user is watching film, then the distance between sound source and user reduces.In order to effectively express close apple, the position producing the target voice corresponding to image object needs to be expressed as to move closer to user, and the information expressed in this respect is sound depth information.
Reference position can comprise various position, such as, and the position in such as predetermined sound source, the position of loud speaker, the position etc. of user.
Sound perspective is the one impression that user is experienced by target voice.By hearing target voice, user discovers the position producing target voice, namely, has produced the position of the sound source of target voice.The impression producing the distance between the position of target voice and the position of user is called as sound perspective.
Hereinafter, with reference to accompanying drawing, exemplary embodiment is described.
Fig. 1 is the block diagram of the stereophonics equipment 100 illustrated according to exemplary embodiment.
Stereophonics equipment 100 comprises sound depth information and obtains unit 110 and third dimension providing unit 120.
Sound depth information acquisition unit 110 obtains the sound depth information about at least one target voice be included in voice signal.The sound produced at least one sound source is included in voice signal.Sound depth information refers to such information, and this information represents the distance between sonorific position (position of such as sound source) and reference position.
Sound depth information can refer to absolute distance between object and reference position and/or the object relative distance relative to reference position.According to another exemplary embodiment, sound depth information can refer to the change of the distance between target voice and reference position.
Sound depth information obtains unit 110 by analyzing voice signal, analyzing 3D rendering data or from picture depth figure, obtaining sound depth information.In the exemplary embodiment, obtain unit 110 based on sound depth information to be provided a description by the example analyzing voice signal acquisition sound depth information.
Sound depth information obtains unit 110 and obtains sound depth information by the multiple parts (section) forming voice signal being adjacent partly to compare.Various methods voice signal being divided into multiple part can be used.Such as, voice signal can be divided into the sampling of predetermined quantity.The part of each division can be called as frame or block.Referring to Fig. 2, the example that sound depth information obtains unit 110 is described in detail.
Third dimension providing unit 120 is based on sound depth information process voice signal, thus user can experience sound perspective.Third dimension providing unit 120 experiences sound perspective to perform operation described below effectively in order to make user.But the operation performed by third dimension providing unit 120 is example, and exemplary embodiment is not limited thereto.
Third dimension providing unit 120 adjusts the power of target voice based on sound depth information.Produce target voice from user more close to, the power of target voice is larger.
Third dimension providing unit 120 adjusts gain and the time of delay of reflected signal based on sound depth information.User hears that object is not subject to the reflected acoustic signal that barrier reflects and the direct voice signal of generation and object are subject to barrier reflection and produce.Reflected acoustic signal has the amplitude less than direct voice signal, and when reflected acoustic signal arrives the position of user, is delayed by predetermined amount of time compared with direct voice signal.Specifically, if produce target voice near user, then compared with direct voice signal, the more late in fact arrival of reflected acoustic signal, thus reflected acoustic signal has in fact the amplitude less than direct voice signal.
Third dimension providing unit 120 is based on the low frequency band component of sound depth information adjustment target voice.If produce target voice near user, then user to perceive low channel component larger.
Third dimension providing unit 120 is based on the phase place of sound depth information adjustment target voice.By the phase place of the target voice exported from the first loud speaker and by larger for the difference between the phase place of target voice that exports from the second loud speaker, it is nearer that user perceives target voice.
The detailed description of the operation to third dimension providing unit 120 is provided referring to Fig. 3.
Fig. 2 illustrates that the sound depth information according to exemplary embodiment obtains the block diagram of unit 110.
Sound depth information obtains unit 110 and comprises power calculation unit 210, determining unit 220 and generation unit 230.
Power calculation unit 210 calculates the power of each frequency band in the multiple parts forming voice signal.
Determine that the method for the size of frequency band can change according to exemplary embodiment.Hereinafter, determine that two kinds of methods of the size of frequency band are described, but exemplary embodiment is not limited thereto.
The frequency component of voice signal can be divided into identical frequency band.The audible frequency range that the mankind can hear is 20-20000Hz.If audible frequency is divided into ten identical frequency bands, then the size of each frequency band is approximately 200Hz.Be that the method for identical frequency band can be called as equivalent rectangular bandwidth division methods by the frequency band division of voice signal.
The frequency component of voice signal can be divided into the frequency band of different size.When listening to low-frequency sound, human auditory even identifiable design goes out little frequency shift, and when listening to high-frequency sound, the mankind even can not identify little frequency shift.Therefore, consider the sense of hearing of the mankind, low-frequency band is divided thick and fast, and high frequency band is divided by coarseness.Therefore, low-frequency band has narrow width, and high frequency band has wider width.
Based on the power of each frequency band, determining unit 220 will have the power of predetermined value or greater value and the frequency band total with adjacent part is defined as common band.Such as, determining unit 220 selects to have in the frequency band of the power in current portions with A or greater value, at least one preceding section the frequency band (or having the frequency band in the frequency band of the 5th maximum power or preceding section with the 5th maximum power in current portions) of the power of A or greater value, and the frequency band selected from preceding section and current portions is defined as common band.The reason being limited to the frequency band of predetermined value or greater value is: obtain the position with the target voice of large signal amplitude.Therefore, the impact with the target voice of little signal amplitude can be minimized, and the impact of master voice object can be maximized.Determining unit 220 determines that another reason of common band is: whether the new target voice determining not to be present in preceding section produces in current portions or whether the characteristic (such as, producing position) of target voice of preexist changes.
Generation unit 230 produces sound depth information based on the difference between the power of the common band of preceding section and the power of the common band of current portions.For convenience, common band is assumed to be 3000-4000Hz.If the power of the frequency component of the 3000-4000Hz in preceding section is 3W, the power of the frequency component of the 3000-4000Hz in current portions is 4.5W, then the power of its instruction common band increases.This can be counted as the instruction producing the target voice of current portions in the position nearer apart from user.Namely, if the difference of the performance number of the public frequency between adjacent part is greater than threshold value, then it can be the instruction that the position between target voice and reference position changes.
According to exemplary embodiment, when the power of the common band of adjacent part changes, determine whether there is the image object close to user based on the depth map information about 3D rendering, namely, from the image object of screen reach.If image object is close to user when the power of common band changes, then can determine that the position producing target voice is moved according to the movement of image object.
Generation unit 230 can be determined: the change of the power of the common band between preceding section and current portions is larger, compared with the target voice corresponding with the common band in preceding section, the target voice corresponding to common band produced in current portions is nearer apart from user.
Fig. 3 illustrates the block diagram being provided stereosonic stereophonics equipment 300 by use two channel sound signals according to exemplary embodiment.
If input signal is multi-channel sound signal, then use stereophonic signal to perform lower mixing, the method for exemplary embodiment can be employed subsequently.
Fast Fourier transform (FFT) unit 310 performs FFT.
Inverse fast Fourier transform (IFFT) unit 320 performs IFFT for the signal performing FFT.
Center signal extraction unit 330 extracts the center signal corresponding to center channel from stereophonic signal.Center signal extraction unit 330 extracts the signal with large correlation from stereophonic signal.In figure 3, suppose to produce sound depth information based on center channel signal.But this is example, and other sound channel signals (such as, such as, before left or right sound channel signal or left or right surround channel signal) can be used to produce sound depth information.
Sound field (sound stage) expanding element 350 expands sound field.Time difference or phase difference are supplied to stereophonic signal by sound field expanding element 350 artificially, thus sound field is positioned at the outside of loud speaker.
Sound depth information obtains unit 360 and obtains sound depth information based on center signal.
Parameter calculation unit 370 determines controling parameters value sound perspective be supplied to required for target voice based on sound depth information.
The amplitude of level control unit 371 control inputs signal.
Phase control unit 372 adjusts the phase place of input signal.
Reflecting effect providing unit 373 is simulated the reflected signal that the input signal reflected by such as wall produces.
Proximity effect providing unit 374 is simulated at the voice signal closely produced apart from user.
Mixed cell 380 mixes at least one signal and the signal of mixing is outputted to loud speaker.
Hereinafter, the operation of stereophonics equipment 300 is described with time sequencing.
First, when inputting multi-channel sound signal, use lower blender (not shown) that multi-channel sound signal is converted to stereophonic signal.
FFT unit 310 stereophonic signal performs FFT and stereophonic signal is outputted to center signal extraction unit 330.
The stereophonic signal of conversion compares by center signal extraction unit 330, and the signal with maximum correlation exports as center channel signal.
Sound depth information obtains unit 360 and produces sound depth information based on center channel signal.The method by using sound depth information acquisition unit 360 to produce sound depth information is described above with reference to Fig. 2.Namely, first, the power forming each frequency band of each of the part of center channel signal is calculated, and determines common band based on the power calculated.Subsequently, the changed power of the common band at least two adjacent parts is measured, and depth factor (depth index) is set up according to changed power.Along with close to user, the changed power of the common band of adjacent part is larger, and the target voice corresponding to common band needs to be expressed larger, thus the large depth factor value of target voice is set up.
Parameter calculation unit 370 calculates the parameter of the module be applied to for giving sound perspective based on depth factor value.
Phase control unit 371, after center channel signal being copied as two signals, adjusts the phase place of the signal be replicated according to the parameter calculated.When using left speaker and right speaker reproduces the voice signal of out of phase, can occur fuzzy.Fuzzy stronger, user is more difficult to accurately perceive the position producing target voice.Due to this phenomenon, when using phase control method together with other third dimension adding methods, relief effect is provided to increase.Produce the position of target voice apart from user nearer (or generation position faster close to user), phase control unit 372 can arrange the phase difference between the phase place of signal that copies is larger.The reproducing signals that have adjusted phase place by IFFT unit 320 to be sent to reflecting effect providing unit 373.
Reflecting effect providing unit 373 pairs of reflected signals are simulated.If the target voice produced is away from user, then directly send to user and the direct signal that is not subject to the reflection of such as wall and the reflect sound that is subject to such as wall reflection and produces have similar amplitude, and arrive between the direct voice of user and reflect sound that life period is poor hardly.But if produce target voice near user, then the amplitude difference between direct voice and reflect sound is large, and the time difference arriving the direct voice of user and reflect sound is large.Therefore, the target voice of generation is nearer apart from user, and reflecting effect providing unit 373 reduces the yield value of reflected signal and the degree of the amplitude of increase time delay or increase direct voice is larger further.The center channel signal considered together with reflected signal is sent to proximity effect providing unit 374 by reflecting effect providing unit 373.
Proximity effect providing unit 374 is simulated at the target voice closely produced apart from user based on the parameter value calculated by operation parameter computing unit 370.If produce target voice in the position near apart from user, then low strap component becomes significantly.The position producing target voice is nearer apart from user, and the low strap component that proximity effect providing unit 374 increases center signal is larger.
The sound field expanding element 350 receiving stereo input signal processes stereo input signal, thus the sound field of stereo input signal is positioned at the outside of loud speaker.If the distance between loud speaker is suitable, then user can hear exist stereo.
Stereo input signal is transformed to the stereophonic signal widened by sound field expanding element 350.Sound field expanding element 350 can comprise widens filter (obtained by the convolution of the synthesis of left/right ears and crosstalk cancellation device and widen filter) and extraordinary filter (obtaining extraordinary filter by the convolution of widening filter and the direct filter of left/right).Widen filter and form virtual acoustic based on the head related transfer function (HRTF) measured in the precalculated position of stereophonic signal for optional position, and based on reflecting that the filter coefficient of HRTF cancels the crosstalk of Virtual Sound source of sound.The direct filter adjustment signals characteristic in left and right, such as, such as, gain between original stereo signal and the Virtual Sound source of sound eliminating crosstalk or delay.
Level control unit 360 adjusts the performance number of target voice based on the depth factor calculated by operation parameter computing unit 370.When producing target voice the closer to user, level control unit 360 also can increase the performance number of target voice.
The stereo input signal sent by level control unit 360 and the center signal sent by proximity effect providing unit 374 combine by mixed cell 380.
Fig. 4 A to 4D illustrates and provides stereosonic example according to exemplary embodiment.
Fig. 4 A illustrates according to the inactive situation of the stereo object of exemplary embodiment.
User uses at least one loud speaker to listen to target voice.If user uses single loudspeaker reproduction monophonic signal, then user can not experience stereophonic effect, but when using two or more loud speakers to carry out rendering stereoscopic acoustical signal, user can experience stereophonic effect.
Fig. 4 B illustrates that depth factor is the reproduced situation of target voice of 0.With reference to Fig. 4 A to Fig. 4 D, suppose that depth factor has the value from 0 to 1.The target voice produced will be expressed as apart from user nearer, and the value of depth factor becomes larger.
Depth factor due to target voice is 0, does not therefore perform and gives relief operation to target voice.But, by the outside allowing sound field to be positioned at loud speaker, use stereophonic signal that user can be made to experience better stereophonic effect.According to exemplary embodiment, technology sound field being positioned at the outside of loud speaker is called as widens.
Usually, the voice signal of multiple sound channel is needed to carry out rendering stereoscopic acoustical signal.Therefore, when inputting monophonic signal, the voice signal corresponding at least two sound channels is produced by upper mixing.
By reproducing the voice signal of the first sound channel through left speaker, by reproducing the voice signal of second sound channel through right speaker, carry out rendering stereoscopic acoustical signal.User by listen to different positions produce at least two sound to experience stereophonic effect.
But if left speaker and right speaker are positioned to too far towards one another, then user perceives sound and produces in same position, therefore can not experience stereophonic effect.In this case, voice signal is processed, thus sound to perceived as be not that physical location from loud speaker produces, but to produce from the outside of loud speaker; Namely, such as, produce from the region (such as, circulating loudspeaker or the region adjacent with loud speaker) of loud speaker outside.
Fig. 4 C illustrates the situation reproduced according to the target voice with depth factor 0.3 of exemplary embodiment.
Because the depth factor of target voice is greater than 0, therefore except widening technology, the third dimension corresponding to depth factor 0.3 is endowed target voice.Therefore, the target voice that the position that user can experience generation actual in it produces apart from the position that user is nearer.
Such as, suppose that user is watching 3D rendering data, and image object is expressed as from screen outstanding.In figure 4 c, sound perspective is endowed the target voice corresponding to image object, thus target voice is treated to just as it close to user.User perceives outstanding view data and close target voice, thus experiences stereophonic effect more true to nature.
Fig. 4 D illustrates the situation that the target voice with depth factor 1 is reproduced.
Because the depth factor of target voice is greater than 0, therefore except widening technology, the sound perspective corresponding to depth factor 1 is endowed target voice.Depth factor due to the target voice shown in Fig. 4 D is greater than the depth factor of the target voice of Fig. 4 C, and therefore user can experience the target voice produced in the position nearer than the position of Fig. 4 C.
Fig. 5 illustrates the flow chart producing the method for sound depth information based on voice signal according to exemplary embodiment.
At operation S510, calculate the power of the frequency band of each of the part forming voice signal.
At operation S520, the power based on each frequency band determines common band.
Common band refers to the power with predetermined value or greater value and is the total frequency band of preceding section and current portions.Here, having low power frequency band can be insignificant target voice (such as, such as noise), therefore can get rid of from common band.Such as, the frequency band of predetermined quantity can be selected with the descending of performance number, common band can be determined in the multiple frequency bands selected subsequently.
At operation S530, the power of the power of the common band of preceding section and the common band of current portions is compared, and result determines depth factor value based on the comparison.If the power of the common band of current portions is greater than the power of the common band of preceding section, then determine that the target voice corresponding to common band produces in the position nearer apart from user.If the power of the power of the common band of current portions and the common band of preceding section is similar, then determine that target voice is not close to user.
Fig. 6 A to Fig. 6 D illustrates the example producing sound depth information from voice signal according to exemplary embodiment.
Fig. 6 A illustrates the voice signal being divided into multiple part along time shaft according to exemplary embodiment.
Fig. 6 B to Fig. 6 D illustrates the power of the frequency band in Part I 601, Part II 602 and Part III 603.In Fig. 6 B to Fig. 6 D, Part I 601 and Part II 602 are preceding section, and Part III 603 is current portions.
With reference to Fig. 6 B and Fig. 6 C, in Part I 601 and Part II 602, the power of the frequency band of 3000-4000Hz, 4000-5000Hz and 5000-6000Hz is similar.Therefore, the frequency band of 3000-4000Hz, 4000-5000Hz and 5000-6000Hz is confirmed as common band.
With reference to Fig. 6 C and Fig. 6 D, when suppose the power of frequency band of 3000-4000Hz, 4000-5000Hz and 5000-6000Hz in all Part I 601, Part II 602 and Part III 603 be predetermined value or greater value time, the frequency band of 3000-4000Hz, 4000-5000Hz and 5000-6000Hz is confirmed as common band.
But compared with the power of the frequency band of the 5000-6000Hz in Part II 602, in Part III 603, the power of the frequency band of 5000-6000Hz increases in fact.Therefore, the depth factor of corresponding to the frequency band of 5000-6000Hz target voice is confirmed as 0 or larger.According to exemplary embodiment, picture depth figure can be referenced, to determine the depth factor of degree of depth object.
Such as, compared with the power of the frequency band of the 5000-6000Hz in Part II 602, the power of the frequency band of the 5000-6000Hz in Part III 603 increases in fact.According to circumstances, this may be produce the target voice corresponding to the frequency band of 5000-6000Hz position also less than close to user, but the only value of power situation about increasing in identical position.Here, if when there is with reference to during picture depth figure the image object moved forward from screen in the picture frame corresponding to Part III 603, then corresponding to the frequency band of 5000-6000Hz target voice may be very high corresponding to the possibility of image object.In this case, the position producing target voice moves closer to user, and therefore the depth factor of target voice is set up 0 or larger.On the other hand, if do not exist from the outstanding image object of screen in the picture frame corresponding to Part III 603, then the while that this power that can be considered to only target voice increasing, same position is kept, and therefore, the depth factor of target voice can be set to 0.
Fig. 7 is the flow chart of the reproduction of stereo sound method illustrated according to exemplary embodiment.
At operation S710, obtain sound depth information.Sound depth information refers to the information of the distance between at least one target voice and reference position represented in voice signal.
At operation S720, give target voice based on sound depth information by sound perspective.Operation S720 can comprise at least one in operation S721 and S722.
At operation S721, adjust the power gain of target voice based on sound depth information.
At operation S722, adjust gain and the time of delay of the reflected signal produced due to barrier reflect sound object based on sound depth information.
At operation S723, adjust the low strap component of target voice based on sound depth information.
At operation S724, adjust by the phase place of the target voice exported from the first loud speaker and by the phase difference between the phase place of target voice that exports from the second loud speaker.

Claims (10)

1. a reproduction of stereo sound method, described method comprises:
Obtain sound depth information, sound depth information represents the distance between at least one target voice and reference position in voice signal;
Based on sound depth information, sound perspective is supplied to the target voice exported from loud speaker,
Wherein, voice signal is divided into multiple adjacent part,
Wherein, the step obtaining sound depth information comprises: the power calculating each frequency band of each of preceding section and current portions, to the power of predetermined value or greater value be had based on the power of each frequency band calculated and the frequency band total with adjacent part is defined as common band, based on the common band in the power of the common band in current portions and preceding section power between difference obtain sound depth information.
2. the method for claim 1, also comprises:
Obtain center channel signal, center channel signal outputs to central loudspeakers from voice signal, and wherein,
The step of rated output comprises: the power calculating each frequency band based on center channel signal.
3. the method for claim 1, wherein provide the step of sound perspective to comprise:
The power of target voice is adjusted based on sound depth information.
4. the method for claim 1, wherein provide the step of sound perspective to comprise:
Gain and the time of delay of the reflected signal produced due to reflect sound object is adjusted based on sound depth information.
5. the method for claim 1, wherein provide the step of sound perspective to comprise:
The size of the low strap component of target voice is adjusted based on sound depth information.
6. the method for claim 1, wherein provide the step of sound perspective to comprise:
Adjust by the phase place of the target voice exported from the first loud speaker and by the phase difference between the phase place of target voice that exports from the second loud speaker.
7. the method for claim 1, also comprises:
Use left side circulating loudspeaker or right side circulating loudspeaker or use loud speaker before the front loud speaker in left side and right side, export and provide relief target voice.
8. the method for claim 1, also comprises:
By use voice signal, sound field is positioned at the perimeter of loud speaker.
9. a stereophonics equipment, comprising:
Information obtainment unit, obtain sound depth information, sound depth information represents the distance between at least one target voice and reference position in voice signal;
Third dimension providing unit, is supplied to target voice based on sound depth information by sound perspective,
Wherein, voice signal is divided into multiple adjacent part,
Wherein, information obtainment unit comprises: power calculation unit, calculates the power of each frequency band of each of preceding section and current portions; Determining unit, the power based on each frequency band calculated will have the power of predetermined value or greater value and the frequency band total with adjacent part is defined as common band; Generation unit, based on the common band in the power of the common band in current portions and preceding section power between difference produce sound depth information.
10. stereophonics equipment as claimed in claim 9, also comprises:
Signal acquisition unit, obtain center channel signal, center channel signal outputs to central loudspeakers from voice signal,
Wherein, power calculation unit calculates the power of each frequency band based on the sound channel signal corresponding to center channel signal.
CN201180033247.8A 2010-05-04 2011-05-04 Method and apparatus for reproducing stereophonic sound Active CN102972047B (en)

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
US33098610P 2010-05-04 2010-05-04
US61/330,986 2010-05-04
KR10-2011-0022451 2011-03-14
KR1020110022451A KR101764175B1 (en) 2010-05-04 2011-03-14 Method and apparatus for reproducing stereophonic sound
PCT/KR2011/003337 WO2011139090A2 (en) 2010-05-04 2011-05-04 Method and apparatus for reproducing stereophonic sound

Publications (2)

Publication Number Publication Date
CN102972047A CN102972047A (en) 2013-03-13
CN102972047B true CN102972047B (en) 2015-05-13

Family

ID=45393150

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201180033247.8A Active CN102972047B (en) 2010-05-04 2011-05-04 Method and apparatus for reproducing stereophonic sound

Country Status (12)

Country Link
US (2) US9148740B2 (en)
EP (1) EP2561688B1 (en)
JP (1) JP5865899B2 (en)
KR (1) KR101764175B1 (en)
CN (1) CN102972047B (en)
AU (1) AU2011249150B2 (en)
BR (1) BR112012028272B1 (en)
CA (1) CA2798558C (en)
MX (1) MX2012012858A (en)
RU (1) RU2540774C2 (en)
WO (1) WO2011139090A2 (en)
ZA (1) ZA201209123B (en)

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101717787B1 (en) * 2010-04-29 2017-03-17 엘지전자 주식회사 Display device and method for outputting of audio signal
JP2012151663A (en) * 2011-01-19 2012-08-09 Toshiba Corp Stereophonic sound generation device and stereophonic sound generation method
JP5776223B2 (en) * 2011-03-02 2015-09-09 ソニー株式会社 SOUND IMAGE CONTROL DEVICE AND SOUND IMAGE CONTROL METHOD
FR2986932B1 (en) * 2012-02-13 2014-03-07 Franck Rosset PROCESS FOR TRANSAURAL SYNTHESIS FOR SOUND SPATIALIZATION
KR20150032253A (en) * 2012-07-09 2015-03-25 엘지전자 주식회사 Enhanced 3d audio/video processing apparatus and method
CN103686136A (en) * 2012-09-18 2014-03-26 宏碁股份有限公司 Multimedia processing system and audio signal processing method
EP2733964A1 (en) * 2012-11-15 2014-05-21 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Segment-wise adjustment of spatial audio signal to different playback loudspeaker setup
BR112016001738B1 (en) 2013-07-31 2023-04-04 Dolby International Ab METHOD, APPARATUS INCLUDING AN AUDIO RENDERING SYSTEM AND NON-TRANSITORY MEANS OF PROCESSING SPATIALLY DIFFUSE OR LARGE AUDIO OBJECTS
KR102226420B1 (en) 2013-10-24 2021-03-11 삼성전자주식회사 Method of generating multi-channel audio signal and apparatus for performing the same
CN104683933A (en) 2013-11-29 2015-06-03 杜比实验室特许公司 Audio object extraction method
CN105323701A (en) * 2014-06-26 2016-02-10 冠捷投资有限公司 Method for adjusting sound effect according to three-dimensional images and audio-video system employing the method
US10163295B2 (en) * 2014-09-25 2018-12-25 Konami Gaming, Inc. Gaming machine, gaming machine control method, and gaming machine program for generating 3D sound associated with displayed elements
US9930469B2 (en) * 2015-09-09 2018-03-27 Gibson Innovations Belgium N.V. System and method for enhancing virtual audio height perception
CN108806560A (en) * 2018-06-27 2018-11-13 四川长虹电器股份有限公司 Screen singing display screen and sound field picture synchronization localization method
KR20200027394A (en) * 2018-09-04 2020-03-12 삼성전자주식회사 Display apparatus and method for controlling thereof
US11032508B2 (en) * 2018-09-04 2021-06-08 Samsung Electronics Co., Ltd. Display apparatus and method for controlling audio and visual reproduction based on user's position

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1714600A (en) * 2002-10-15 2005-12-28 韩国电子通信研究院 Method for generating and consuming 3d audio scene with extended spatiality of sound source
WO2009116800A2 (en) * 2008-03-20 2009-09-24 Park Seung-Min Display device with object-oriented stereo sound coordinate display

Family Cites Families (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH06269096A (en) 1993-03-15 1994-09-22 Olympus Optical Co Ltd Sound image controller
DE19735685A1 (en) 1997-08-19 1999-02-25 Wampfler Ag Non contact electrical energy transmission device for personal vehicle
JPH11220800A (en) 1998-01-30 1999-08-10 Onkyo Corp Sound image moving method and its device
EP0932325B1 (en) 1998-01-23 2005-04-27 Onkyo Corporation Apparatus and method for localizing sound image
KR19990068477A (en) 1999-05-25 1999-09-06 김휘진 3-dimensional sound processing system and processing method thereof
RU2145778C1 (en) 1999-06-11 2000-02-20 Розенштейн Аркадий Зильманович Image-forming and sound accompaniment system for information and entertainment scenic space
ATE269622T1 (en) 2000-04-13 2004-07-15 Qvc Inc DEVICE AND METHOD FOR DIGITAL BROADCASTING WITH TARGETED SOUND CONTENT
US6829018B2 (en) * 2001-09-17 2004-12-07 Koninklijke Philips Electronics N.V. Three-dimensional sound creation assisted by visual information
RU23032U1 (en) 2002-01-04 2002-05-10 Гребельский Михаил Дмитриевич AUDIO TRANSMISSION SYSTEM
AU2003269551A1 (en) 2002-10-15 2004-05-04 Electronics And Telecommunications Research Institute Method for generating and consuming 3d audio scene with extended spatiality of sound source
GB2397736B (en) 2003-01-21 2005-09-07 Hewlett Packard Co Visualization of spatialized audio
RU2232481C1 (en) 2003-03-31 2004-07-10 Волков Борис Иванович Digital tv set
KR100677119B1 (en) 2004-06-04 2007-02-02 삼성전자주식회사 Apparatus and method for reproducing wide stereo sound
JP2006128816A (en) * 2004-10-26 2006-05-18 Victor Co Of Japan Ltd Recording program and reproducing program corresponding to stereoscopic video and stereoscopic audio, recording apparatus and reproducing apparatus, and recording medium
KR100688198B1 (en) * 2005-02-01 2007-03-02 엘지전자 주식회사 terminal for playing 3D-sound And Method for the same
US20060247918A1 (en) * 2005-04-29 2006-11-02 Microsoft Corporation Systems and methods for 3D audio programming and processing
JP4835298B2 (en) * 2006-07-21 2011-12-14 ソニー株式会社 Audio signal processing apparatus, audio signal processing method and program
KR100922585B1 (en) * 2007-09-21 2009-10-21 한국전자통신연구원 SYSTEM AND METHOD FOR THE 3D AUDIO IMPLEMENTATION OF REAL TIME e-LEARNING SERVICE
KR101415026B1 (en) * 2007-11-19 2014-07-04 삼성전자주식회사 Method and apparatus for acquiring the multi-channel sound with a microphone array
JP5274359B2 (en) 2009-04-27 2013-08-28 三菱電機株式会社 3D video and audio recording method, 3D video and audio playback method, 3D video and audio recording device, 3D video and audio playback device, 3D video and audio recording medium
KR101690252B1 (en) 2009-12-23 2016-12-27 삼성전자주식회사 Signal processing method and apparatus

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1714600A (en) * 2002-10-15 2005-12-28 韩国电子通信研究院 Method for generating and consuming 3d audio scene with extended spatiality of sound source
WO2009116800A2 (en) * 2008-03-20 2009-09-24 Park Seung-Min Display device with object-oriented stereo sound coordinate display

Also Published As

Publication number Publication date
CN102972047A (en) 2013-03-13
BR112012028272A2 (en) 2016-11-01
US9148740B2 (en) 2015-09-29
ZA201209123B (en) 2017-04-26
WO2011139090A2 (en) 2011-11-10
RU2012151848A (en) 2014-06-10
JP5865899B2 (en) 2016-02-17
JP2013529017A (en) 2013-07-11
CA2798558A1 (en) 2011-11-10
KR101764175B1 (en) 2017-08-14
RU2540774C2 (en) 2015-02-10
US9749767B2 (en) 2017-08-29
US20110274278A1 (en) 2011-11-10
CA2798558C (en) 2018-08-21
BR112012028272B1 (en) 2021-07-06
KR20110122631A (en) 2011-11-10
MX2012012858A (en) 2013-04-03
EP2561688B1 (en) 2019-02-20
EP2561688A4 (en) 2015-12-16
EP2561688A2 (en) 2013-02-27
AU2011249150B2 (en) 2014-12-04
US20150365777A1 (en) 2015-12-17
AU2011249150A1 (en) 2012-12-06
WO2011139090A3 (en) 2012-01-05

Similar Documents

Publication Publication Date Title
CN102972047B (en) Method and apparatus for reproducing stereophonic sound
US9918179B2 (en) Methods and devices for reproducing surround audio signals
JP5944840B2 (en) Stereo sound reproduction method and apparatus
KR101572894B1 (en) A method and an apparatus of decoding an audio signal
KR20080060640A (en) Method and apparatus for reproducing a virtual sound of two channels based on individual auditory characteristic
CN101112120A (en) Apparatus and method of processing multi-channel audio input signals to produce at least two channel output signals therefrom, and computer readable medium containing executable code to perform the me
KR102160248B1 (en) Apparatus and method for localizing multichannel sound signal
EP2368375B1 (en) Converter and method for converting an audio signal
Bai et al. Upmixing and downmixing two-channel stereo audio for consumer electronics
US20190394596A1 (en) Transaural synthesis method for sound spatialization
US20200059750A1 (en) Sound spatialization method
CA3192986A1 (en) Sound reproduction with multiple order hrtf between left and right ears
Grosse et al. Evaluation of a perceptually optimized room-in-room reproduction method for playback room compensation
JP2010263295A (en) Speaker device and sound reproducing method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant