WO2010022633A1 - 音频信号的生成、播放方法及装置、处理系统 - Google Patents
音频信号的生成、播放方法及装置、处理系统 Download PDFInfo
- Publication number
- WO2010022633A1 WO2010022633A1 PCT/CN2009/073406 CN2009073406W WO2010022633A1 WO 2010022633 A1 WO2010022633 A1 WO 2010022633A1 CN 2009073406 W CN2009073406 W CN 2009073406W WO 2010022633 A1 WO2010022633 A1 WO 2010022633A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- audio signal
- information
- video
- audio
- distance information
- Prior art date
Links
- 230000005236 sound signal Effects 0.000 title claims abstract description 651
- 238000000034 method Methods 0.000 title claims abstract description 93
- 230000009466 transformation Effects 0.000 claims description 8
- 238000003672 processing method Methods 0.000 claims description 5
- 238000000926 separation method Methods 0.000 claims description 5
- 230000005540 biological transmission Effects 0.000 abstract description 11
- 238000010586 diagram Methods 0.000 description 40
- 238000004891 communication Methods 0.000 description 9
- 238000003384 imaging method Methods 0.000 description 5
- 238000004364 calculation method Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 230000015572 biosynthetic process Effects 0.000 description 1
- 238000009795 derivation Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S7/00—Indicating arrangements; Control arrangements, e.g. balance control
- H04S7/30—Control circuits for electronic adaptation of the sound field
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/008—Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/14—Systems for two-way working
- H04N7/141—Systems for two-way working between two video terminals, e.g. videophone
- H04N7/147—Communication arrangements, e.g. identifying the communication as a video-communication, intermediate storage of the signals
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/14—Systems for two-way working
- H04N7/15—Conference systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02161—Number of inputs available containing the signal or the noise to be suppressed
- G10L2021/02166—Microphone arrays; Beamforming
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0272—Voice signal separating
- G10L21/028—Voice signal separating using properties of sound source
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/20—Image signal generators
- H04N13/204—Image signal generators using stereoscopic image cameras
- H04N13/239—Image signal generators using stereoscopic image cameras using two 2D image sensors having a relative position equal to or related to the interocular distance
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/20—Image signal generators
- H04N13/271—Image signal generators wherein the generated image signals comprise depth maps or disparity maps
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/30—Image reproducers
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R1/00—Details of transducers, loudspeakers or microphones
- H04R1/20—Arrangements for obtaining desired frequency or directional characteristics
- H04R1/32—Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only
- H04R1/40—Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only by combining a number of identical transducers
- H04R1/403—Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only by combining a number of identical transducers loud-speakers
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R1/00—Details of transducers, loudspeakers or microphones
- H04R1/20—Arrangements for obtaining desired frequency or directional characteristics
- H04R1/32—Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only
- H04R1/40—Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only by combining a number of identical transducers
- H04R1/406—Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only by combining a number of identical transducers microphones
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R2203/00—Details of circuits for transducers, loudspeakers or microphones covered by H04R3/00 but not provided for in any of its subgroups
- H04R2203/12—Beamforming aspects for stereophonic sound reproduction with loudspeaker arrays
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2400/00—Details of stereophonic systems covered by H04S but not provided for in its groups
- H04S2400/11—Positioning of individual sound objects, e.g. moving airplane, within a sound field
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2400/00—Details of stereophonic systems covered by H04S but not provided for in its groups
- H04S2400/15—Aspects of sound capture and related signal processing for recording or reproduction
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2420/00—Techniques used stereophonic systems covered by H04S but not provided for in its groups
- H04S2420/03—Application of parametric coding in stereophonic audio systems
Definitions
- the present invention relates to the field of communications technologies, and in particular, to a method and device for generating and playing an audio signal, and a processing system. Background technique
- the 3D video technology can provide a picture with depth information in accordance with the principle of stereo vision.
- the 3D audio technology uses a microphone array for sound pickup, and the enhanced sound and the direction and distance of the sound can be obtained by beamforming, etc.
- the speaker array using wavefront synthesis, can reproduce sounds with a sense of direction and distance.
- FIG. 1A a horizontal view corresponding to the original venue layout map in the prior art is attended by a total of 7 participants, wherein the conference participant P1 is located in the front row and the conference participant P2 is located in the rear row.
- FIG. 1B a schematic diagram of a scene in the site of FIG. 1A displayed on a screen in a replay site in the prior art, if a conference participant in the conference site is located at 0, notice 0, P1 is located. The position and the position of P2 are located on a straight line.
- the imaging and display principle of stereo vision when an object in the site 1 is displayed in another site 2 by using the three-dimensional display technology, It is necessary to make the conference participants in the venue 2 appear to be in front of the display 21, such as the location of the location C, or after the display, such as the location of the location B.
- the object in 2 is a conference participant in the conference site 1.
- the corresponding position in the conference site 1 is A; in the conference site 2, if it is displayed in the position before the display screen, such as the position C, the sound is If it is sent from B, then this will also affect the communication and communication between the conference participants in the conference site 2 and the conference participants in the conference site 1.
- the inventors have found that: in the prior art, in order to obtain a more accurate direction and distance of the sound, it is mostly used to increase the number of microphones deployed in the microphone array or / and to increase the relationship between the microphones. spacing. For a microphone array, the greater the number of microphones deployed in the microphone array, the greater the spacing between the microphones, the more accurate the direction and distance of the sound, but the larger the volume of the microphone array. The number of microphones is reduced, the spacing between the microphones is reduced, and the direction and distance of the sound obtained by the microphone array, especially the distance, is reduced, which is in the scene where the distance of the sound needs to be considered during playback.
- Embodiments of the present invention provide a method and a device for generating and playing an audio signal, and a processing system, which can obtain position information of a more accurate audio signal, including direction information and distance information, without increasing the volume of the microphone array.
- An embodiment of the present invention provides a method for generating an audio signal, including:
- the audio signal, the direction information of the audio signal, and the distance information of the audio signal are encoded and transmitted.
- An embodiment of the present invention provides an apparatus for generating an audio signal, including:
- a distance information acquiring module of the audio signal configured to generate, according to the obtained direction information of the audio signal and the auxiliary video, distance information of the audio signal corresponding to the position where the viewpoint is located, where the auxiliary video is a disparity map or a depth map;
- the audio signal encoding module encodes and transmits the audio signal, the direction information of the audio signal, and the distance information of the audio signal.
- An embodiment of the present invention provides a method for playing an audio signal, including:
- the speaker signal is played using a speaker array or a surround sound system.
- An embodiment of the present invention provides a playback apparatus for an audio signal, including:
- An audio signal decoding module configured to decode the received encoded data to obtain direction information of the audio signal and the audio signal;
- a distance information acquiring module of the receiving end audio signal configured to obtain distance information of the audio signal
- a speaker signal acquiring module configured to receive direction information of the audio signal and the audio signal from the audio signal decoding module, from the The distance information acquiring module of the receiving end audio signal receives the distance information of the audio signal, and according to the direction information of the audio signal and the distance information of the audio signal, the audio signal is processed by using an audio signal reproducing method to obtain a speaker signal corresponding to each speaker;
- a speaker signal playback module for playing the speaker signal using a speaker array or a surround sound system.
- Embodiments of the present invention provide a processing system for an audio signal, including an audio signal generating device and an audio signal playing device;
- the audio signal generating device includes a distance information acquiring module of the audio signal, configured to generate distance information of the audio signal corresponding to the position where the viewpoint is located according to the direction information of the acquired audio signal and the auxiliary video, where the auxiliary video is parallax a picture or depth map; an audio signal encoding module that encodes and transmits an audio signal, direction information of the audio signal, and distance information of the audio signal;
- the audio signal playing device comprises an audio signal decoding module for receiving the received encoded data Decoding, obtaining direction information of the audio signal and the audio signal; a distance information acquiring module of the receiving end audio signal, for acquiring distance information of the audio signal; a speaker signal acquiring module, configured to: according to the direction information of the audio signal, The distance information of the audio signal is processed by the audio signal reproducing method to obtain a speaker signal corresponding to each speaker; and the speaker signal playing module is configured to play the speaker signal using a speaker array or a surround sound system.
- the embodiment of the invention can accurately obtain the position information of the audio signal, including the direction information and the distance information, in combination with the three-dimensional video signal and the three-dimensional audio signal without increasing the volume of the microphone array, and further realize the transmission and playback of the audio signal.
- 1A is a horizontal view corresponding to an original site layout diagram in the prior art
- FIG. 1B is a schematic diagram of a scene in the conference site of FIG. 1A displayed on a screen in a reproduction site in the prior art;
- FIG. 1B is a schematic diagram of a scene in the conference site of FIG. 1A displayed on a screen in a reproduction site in the prior art;
- Embodiment 3 is a schematic flowchart of Embodiment 1 of a method for generating an audio signal according to the present invention
- Embodiment 4 is a schematic flow chart of Embodiment 2 of a method for generating an audio signal according to the present invention
- FIG. 5 is a schematic diagram of calculating an abscissa of an audio signal in an auxiliary video in Embodiment 2 of a method for generating an audio signal according to the present invention
- FIG. 6 is a schematic diagram of calculating an ordinate of an audio signal in an auxiliary video in Embodiment 2 of a method for generating an audio signal according to the present invention
- FIG. 7 is a schematic diagram showing relationship between image parallax and depth and distance of a viewpoint from a display in a parallel camera system according to Embodiment 2 of the method for generating an audio signal according to the present invention
- 8 is a schematic diagram of an XZ plane of calculating a distance of an audio signal in Embodiment 2 of a method for generating an audio signal according to the present invention
- FIG. 9 is a schematic diagram of a YZ plane for calculating a distance of an audio signal in Embodiment 2 of a method for generating an audio signal according to the present invention.
- Embodiment 3 is a schematic flowchart diagram of Embodiment 3 of a method for generating an audio signal according to the present invention
- Embodiment 11 is a schematic structural diagram of Embodiment 1 of an apparatus for generating an audio signal according to the present invention
- Embodiment 12 is a schematic structural diagram of Embodiment 2 of an apparatus for generating an audio signal according to the present invention
- Embodiment 3 is a schematic structural diagram of Embodiment 3 of an apparatus for generating an audio signal according to the present invention.
- Embodiment 14 is a schematic flowchart of Embodiment 1 of a method for playing an audio signal according to the present invention
- Embodiment 15 is a schematic flowchart of Embodiment 2 of a method for playing an audio signal according to the present invention.
- Embodiment 3 is a schematic flowchart diagram of Embodiment 3 of a method for playing an audio signal according to the present invention
- Embodiment 4 is a schematic flow chart of Embodiment 4 of a method for playing an audio signal according to the present invention.
- Embodiment 1 of an audio signal playing device is a schematic structural diagram of Embodiment 1 of an audio signal playing device according to the present invention.
- Embodiment 19 is a schematic structural diagram of Embodiment 2 of an audio signal playing device according to the present invention.
- Embodiment 3 of an audio signal playing device according to the present invention.
- Embodiment 4 of an audio signal playing device according to the present invention.
- FIG. 22 is a schematic structural diagram of an embodiment of a processing system for an audio signal according to the present invention. detailed description
- a schematic flowchart of a method for generating an audio signal according to the present invention may specifically include the following steps:
- Step 11 Generate, according to the obtained direction information of the audio signal and the auxiliary video, distance information of the audio signal corresponding to the position where the viewpoint is located, where the auxiliary video is a disparity map or a depth map;
- Step 12 encode and transmit the audio signal, the direction information of the audio signal, and the distance information of the audio signal.
- the distance information of the audio signal can be accurately obtained according to the direction information of the acquired audio signal and the auxiliary video without increasing the volume of the microphone array, thereby implementing the transmission of the audio signal.
- FIG. 4 it is a schematic flowchart of the second embodiment of the method for generating an audio signal according to the present invention.
- the following steps may be specifically included:
- Step 21 The microphone array captures at least two audio signals as an input audio stream
- Step 22 processing the input audio stream by using a microphone array processing method to obtain an enhanced audio signal and direction information of the audio signal;
- Step 23 The camera group captures at least two video signals as an input video stream
- Step 24 According to the input video stream, obtain the main video and the auxiliary video.
- step 11 may specifically include the following steps: Step 26: Obtain depth information of the audio signal according to direction information of the audio signal and the auxiliary video.
- Step 27 Acquire coordinate information of the audio signal in the display site according to the depth information and the direction information of the audio signal;
- Step 28 Generate distance information of the audio signal corresponding to the position where the viewpoint is located, according to the coordinate information of the audio signal and the position information of the viewpoint.
- step 26 the following steps may also be included:
- Step 25 Convert the direction information of the auxiliary video and the audio signal to the same coordinate system according to the position information of the microphone array and the camera group.
- the step 26 may specifically include the following steps:
- Step 261 Acquire coordinates of the audio signal in the auxiliary video according to the direction information of the audio signal, determine whether the auxiliary video is a depth map or a disparity map; if the auxiliary video is a depth map, perform step 262; if the auxiliary video is a disparity map, perform step 263 ;
- Step 262 Obtain depth information corresponding to the audio signal directly from the depth map according to the coordinate;
- Step 263 Acquire a disparity corresponding to the audio signal from the disparity map according to the coordinate, and calculate depth information corresponding to the audio signal according to the disparity.
- step 21 may specifically include the following steps:
- Step 21 The microphone array captures at least two audio signals as a first input audio stream, and each audio signal is a mixed audio signal composed of sounds of multiple sound sources;
- Step 212 Separate the audio signals in the first input audio stream by using an audio signal separation method, respectively acquire audio signals corresponding to the sound of each sound source, and compose the audio signals corresponding to the sound of each sound source into an input audio stream.
- the microphone array is composed of two microphones, so the input audio stream contains at least two audio signals, and then the microphone array processing method, such as the beamforming method, is used to process the input audio stream to obtain enhancement.
- the microphone array processing method such as the beamforming method
- the camera group consists of two cameras, so the input video stream contains at least two video signals. Then, the main video and the auxiliary video are obtained according to the input video stream, and in the case where there are two cameras, one auxiliary video can be obtained; if one camera group has more than two cameras, multiple auxiliary videos can be obtained. At the same time, one or more video streams in the input video stream are selected as the main video. In the simplest case, if two cameras exist, the video captured by one of the cameras is taken as the main video.
- step 24 there is no strict timing relationship between step 21, step 22 and step 2 3, step 24, that is, in a specific implementation, step 23 and step 24 may be performed first, and then steps 21 and 22 are performed. Therefore, any sequence of switching can achieve the technical effects of the embodiments of the present invention.
- FIG. 5 it is a schematic diagram of calculating the abscissa of the audio signal in the auxiliary video in the second embodiment of the method for generating an audio signal according to the present invention.
- the zero point of the origin corresponds to the center of the camera lens
- the z-axis is along the vertical direction of the camera lens.
- ⁇ The plane formed by the axis and the axis is perpendicular to the z-axis.
- the plane where the spatial point lies is the plane where the sound source point ⁇ ⁇ is located and perpendicular to the z-axis, and the distance between the plane and the O point along the z-axis, that is, the object point is the plane where the spatial point is the point of the image point of the sound source point and The plane perpendicular to the z-axis, it The distance from the O point along the z-axis, that is, the image distance is equal to the focal length of the camera. Let the distance between the sound source point and the ⁇ axis and the ⁇ axis be respectively; and let the distance of the sound source point P ⁇ through the imaging point A of the camera to the axis and the x axis be w and w, respectively.
- the microphone array measures the vector of 0 point and S point ⁇ ; the angle between the projection on the S plane and the z axis is Z «, then according to the nature of the right triangle, w can be obtained: as shown in Fig. 6,
- a schematic diagram of calculating the ordinate of the audio signal in the auxiliary video is performed.
- the microphone array measures the vector of the 0 point and the S point.
- the coordinates (w, ) of the imaging point P 2 corresponding to the sound source point S can be obtained. Since the size and position information of the auxiliary video and camera imaging are the same, the coordinates (w, //) of the imaging point P 2 are the coordinates of the corresponding points of the sound source point S on the auxiliary video.
- the depth information corresponding to the sound source point ⁇ is directly obtained from the depth map according to the coordinates.
- the auxiliary video is a disparity map
- the disparity corresponding to the sound source point S is obtained from the disparity map according to the coordinates, and the depth information is calculated according to the parallax according to the following formula:
- ⁇ is the depth and p is the parallax, indicating the distance of the viewpoint from the display, indicating the distance between the two eyes of the person.
- FIG. 7 is a schematic diagram showing the relationship between image parallax and depth and the distance of the viewpoint from the display in the parallel camera system according to the second embodiment of the method for generating an audio signal according to the present invention.
- the origin coordinate 0 of the system is located on the display screen, and the Z axis is oriented toward the viewpoint.
- the axis corresponds to the display
- ⁇ indicates the depth
- p indicates the parallax
- D indicates the distance of the viewpoint from the display
- 3 ⁇ 4 indicates the distance between the two eyes of the person
- the position of the left and right eyes of the person corresponds to the coordinates in the coordinate system respectively ( 0, D) , (x B , D)
- the position of the audio signal is ( , )
- 3 ⁇ 4 and 3 ⁇ 4 respectively refer to the coordinates of the viewpoint in the display in the left and right view
- the distance between 3 ⁇ 4 and 3 ⁇ 4 p is the parallax.
- FIG. 8 it is a schematic diagram of the xz plane of calculating the distance of the audio signal in the second embodiment of the method for generating an audio signal according to the present invention, which is the coordinate information of the audio signal, wherein the depth ⁇ has been obtained, and the microphone array measures the coordinate origin O point.
- the angle between the projection of the vector formed by the point on the ⁇ plane and the ⁇ axis is ⁇ «, then the abscissa ⁇ of the audio signal can be calculated by:
- the distance information of the audio signal is the vector corresponding to the distance P is VP.
- FIG. 9 a schematic diagram of a ⁇ plane for calculating a distance of an audio signal in Embodiment 2 of a method for generating an audio signal according to the present invention is shown.
- the microphone array measures the vector formed by the zero point of the coordinate origin and the point ⁇ ; the angle between the projection on the plane and the z-axis is ⁇ , then the audio signal can be calculated in the display field by the following formula: Y-axis:
- the distance information of the audio signal is the vector corresponding to the distance p.
- the embodiment may further include the following steps:
- Step 210 Encode and send the auxiliary video.
- an input audio stream and an input video are obtained by using a microphone array and a camera group, respectively.
- Streaming, then inputting the audio stream and the input video stream to obtain direction information and auxiliary video of the audio signal, and then calculating distance information of the audio signal according to the direction information of the audio signal and the auxiliary video, can be combined without increasing the volume of the microphone array
- the three-dimensional video signal and the three-dimensional audio signal accurately obtain the position information of the audio signal, including the direction information and the distance information, thereby implementing the transmission of the audio signal.
- each audio source is separated from the first input audio stream captured by the microphone array by using an audio signal separation method.
- the audio signal corresponding to the sound the audio signal corresponding to the sound of each sound source is composed into an input audio stream, and the input audio stream is continuously processed to accurately obtain the position information of the audio signal corresponding to the sound of each sound source, including the direction information and Distance information.
- the coordinate system used by the microphone array to measure the direction of the sound source does not necessarily coincide with the coordinate system of the camera system. Therefore, the two coordinate systems need to be transformed so that the calculations are all the same. Performed in the coordinate system.
- the embodiment of the invention can accurately obtain the position information of the audio signal, including the direction information and the distance information, in combination with the three-dimensional video signal and the three-dimensional audio signal without increasing the volume of the microphone array, and further realize the transmission and playback of the audio signal.
- step 210 may also be:
- Step 21 Encode and send the main video and the auxiliary video.
- the position information of the audio signal including the direction information and the distance information, can be accurately obtained by combining the three-dimensional video signal and the three-dimensional audio signal without increasing the volume of the microphone array, and the transmission of the audio signal and the video signal is further realized.
- Audio signal generating apparatus embodiment 1 Audio signal generating apparatus embodiment 1
- FIG. 11 is a schematic structural diagram of Embodiment 1 of an apparatus for generating an audio signal according to the present invention, which may specifically include a distance information acquiring module 31 and an audio signal encoding module 32 of an audio signal, and obtaining distance information of an audio signal encoding module 32 and an audio signal.
- Module 31 is connected.
- the distance information acquisition module 31 of the audio signal is configured to generate distance information of the audio signal corresponding to the position where the viewpoint is located according to the direction information of the acquired audio signal and the auxiliary video, where the auxiliary video is a disparity map or a depth map;
- the audio signal encoding module 32 is for encoding and transmitting the audio signal, the direction information of the audio signal, and the distance information of the audio signal.
- the distance information acquiring module 31 of the audio signal in the embodiment generates the distance information of the audio signal according to the direction information of the acquired audio signal and the auxiliary video, and the audio signal encoding module 32 sets the direction information of the audio signal, the audio signal, and the distance information of the audio signal.
- the encoding and transmitting are performed, so that the position information of the audio signal, including the direction information and the distance information, can be accurately obtained by combining the three-dimensional video signal and the three-dimensional audio signal without increasing the volume of the microphone array, thereby further transmitting the audio signal.
- the distance information acquiring module 31 of the audio signal may specifically include a depth information acquiring unit 31 1 .
- the coordinate information acquiring unit 312 and the distance information acquiring unit 31 3 are connected to the depth information acquiring unit 31 1 , and the distance information acquiring unit 31 3 is connected to the coordinate information acquiring unit 312 , wherein the depth information acquiring unit 31 1 is used.
- the coordinate information acquiring unit 312 is configured to obtain coordinate information of the audio signal in the display site according to the depth information and the direction information of the audio signal;
- the distance information acquiring unit 31 3 is configured to generate distance information of the audio signal corresponding to the position where the viewpoint is located according to the coordinate information of the audio signal and the position information of the viewpoint, and send the distance information of the audio signal to the audio signal encoding module 32.
- the embodiment may further include a microphone array 33, an audio input signal processing module 34, a video capture module 35, and a video input signal processing module 36.
- the audio input signal processing module 34 is coupled to the microphone array 33, and the video input signal processing module 36 and the video capture system Module 35 is connected.
- the microphone array 33 is used for the microphone array to capture at least two audio signals as an input audio stream; the audio input signal processing module 34 is configured to process the input audio stream by using a microphone array processing method to obtain an enhanced audio signal and direction information of the audio signal.
- the microphone array 33 may specifically include a microphone array unit 330 and an audio signal separation unit 331.
- the microphone array unit 330 is configured to capture at least two audio signals as a first input audio stream, and each audio signal is a mixed audio signal composed of sounds of multiple sound sources; the audio signal separating unit 331 is configured to separate the audio signals.
- the method separates each audio signal in the first input audio stream, respectively acquires an audio signal corresponding to the sound of each sound source, and combines the audio signal corresponding to the sound of each sound source into an input audio stream, and sends the input audio stream to the audio input signal.
- Processing module 34 The method separates each audio signal in the first input audio stream, respectively acquires an audio signal corresponding to the sound of each sound source, and combines the audio signal corresponding to the sound of each sound source into an input audio stream, and sends the input audio stream to the audio input signal.
- the distance information acquisition module 31 of the audio signal may further include a coordinate transformation unit 314 connected to the video input signal processing module 36 and the audio input signal processing module 34 for using the auxiliary video according to the position information of the microphone array and the camera group.
- the direction information of the audio signal and the audio signal are sent to the depth information acquiring unit 311, and the direction information of the coordinate transformed audio signal is sent to the coordinate information acquiring unit. 312.
- the embodiment may further include a first video encoding module 38 connected to the video input signal processing module 36 for encoding and transmitting the auxiliary video.
- a first video encoding module 38 connected to the video input signal processing module 36 for encoding and transmitting the auxiliary video.
- the embodiment may further include a sending end communication interface 39, connected to the audio signal encoding module 32 and the first video encoding module 38, for transmitting the encoded data through the network.
- a sending end communication interface 39 connected to the audio signal encoding module 32 and the first video encoding module 38, for transmitting the encoded data through the network.
- the video capture module 35 typically uses a camera set consisting of two cameras to capture the scene. It is also possible to use a depth camera that directly derives depth information to directly obtain depth information, in which case the video input signal processing module 36 will no longer be required. If the microphone array 33 provides the functionality of the audio input signal processing module 34, the audio input signal processing module 34 will no longer be required in this case.
- the distance information acquiring module 31 of the audio signal in the embodiment generates the distance information of the audio signal according to the direction information of the acquired audio signal and the auxiliary video, and the audio signal encoding module 32 sets the direction information of the audio signal, the audio signal, and the distance information of the audio signal.
- the first video encoding module 38 encodes and transmits the auxiliary video, so that the position information of the audio signal, including the direction information, can be accurately obtained by combining the three-dimensional video signal and the three-dimensional audio signal without increasing the volume of the microphone array.
- the distance information further realizes the transmission of the audio signal and the auxiliary video.
- the coordinate transformation unit 314 transforms the two coordinate systems, so that the calculation is performed. Performed in the same coordinate system.
- the embodiment of the invention can accurately obtain the position information of the audio signal, including the direction information and the distance information, in combination with the three-dimensional video signal and the three-dimensional audio signal without increasing the volume of the microphone array, and further realize the transmission and playback of the audio signal.
- FIG. 13 is a schematic structural diagram of Embodiment 3 of an apparatus for generating an audio signal according to the present invention.
- the first video encoding module 38 may also be a second video encoding module 315.
- the main video and the auxiliary video are encoded and transmitted.
- the transmitting communication interface 39 is connected to the audio signal encoding module 32 and the second video encoding module 315.
- the distance information acquiring module 31 of the audio signal in the embodiment generates the distance information of the audio signal according to the direction information of the acquired audio signal and the auxiliary video, and the audio signal encoding module 32 sets the direction information of the audio signal, the audio signal, and the distance information of the audio signal.
- the second video encoding module 315 encodes and transmits the main video and the auxiliary video, so that the position information of the audio signal can be accurately obtained by combining the three-dimensional video signal and the three-dimensional audio signal without increasing the volume of the microphone array.
- the direction information and the distance information are included to further realize the transmission of the audio signal and the video signal.
- the coordinate transformation unit 314 transforms the two coordinate systems, so that the calculation is performed. Performed in the same coordinate system.
- the embodiment of the invention can accurately obtain the position information of the audio signal, including the direction information and the distance information, in combination with the three-dimensional video signal and the three-dimensional audio signal without increasing the volume of the microphone array, and further realize the transmission and playback of the audio signal.
- Audio signal playing method embodiment 1 Audio signal playing method embodiment 1
- a schematic flowchart of a method for playing an audio signal according to the present invention may specifically include the following steps: Step 41: Decode the received encoded data to obtain direction information of the audio signal and the audio signal.
- Step 42 Obtain distance information of the audio signal
- Step 43 According to the direction information of the audio signal and the distance information of the audio signal, the audio signal is processed by the audio signal reproducing method to obtain a speaker signal corresponding to each speaker; Step 44, using a speaker array or a surround sound system to play the speaker signal .
- the received encoded data is decoded, the direction information of the audio signal and the audio signal is obtained, the distance information of the audio signal is obtained, and the audio signal is processed according to the direction information of the audio signal and the distance information of the audio signal to obtain a speaker signal. Then, the speaker signal is played, so that the position information of the audio signal, including the direction information and the distance information, can be accurately obtained by combining the three-dimensional video signal and the three-dimensional audio signal without increasing the volume of the microphone array, thereby further realizing the audio signal playing.
- FIG. 15 is a schematic flowchart of Embodiment 2 of the audio signal playing method of the present invention.
- step 42 may specifically include:
- Step 421 Decode the received encoded data to obtain distance information of the audio signal.
- the received encoded data is decoded, the direction information of the audio signal and the audio signal, and the distance information of the audio signal are obtained, and the audio signal is processed according to the direction information of the audio signal and the distance information of the audio signal to obtain a speaker signal.
- the speaker information is played again, so that the received encoded data can be decoded without increasing the volume of the microphone array, and the position information of the audio signal, including the direction information and the distance information, can be accurately obtained, and the audio signal can be further played.
- FIG. 16 is a schematic flowchart of a third embodiment of a method for playing an audio signal according to the present invention.
- the method may further include:
- Step 51 Decode the received encoded data to obtain an auxiliary video.
- step 42 may specifically include: Step 422: Obtain a depth of the audio signal according to the direction information of the audio signal and the auxiliary video. Information
- Step 423 Acquire coordinate information of the audio signal in the display site according to the depth information and the direction information of the audio signal;
- Step 424 Generate distance information of the audio signal corresponding to the position where the viewpoint is located, according to the coordinate information of the audio signal and the position information of the viewpoint.
- step 422 the following steps may be further included:
- Step 421 Convert the direction information of the auxiliary video and the audio signal to the same coordinate system according to the position information of the microphone array and the camera group.
- the received encoded data is decoded, the audio signal, the direction information of the audio signal, and the auxiliary video are acquired, and the distance information of the audio signal is obtained according to the direction information of the audio signal and the auxiliary video, according to the direction information and the audio signal of the audio signal.
- the distance information processes the audio signal, obtains the speaker signal, and then plays the speaker information, so that the position information of the audio signal, including the direction information, can be accurately obtained by combining the three-dimensional video signal and the three-dimensional audio signal without increasing the volume of the microphone array.
- the distance information further enables audio signal playback.
- the coordinate system used by the microphone array to measure the direction of the sound source does not necessarily coincide with the coordinate system of the camera system. Therefore, the two coordinate systems need to be transformed so that the calculations are all the same. Performed in the coordinate system.
- FIG. 17 is a schematic flowchart of a fourth embodiment of a method for playing an audio signal according to the present invention.
- the method may further include:
- Step 52 Decode the received encoded data to obtain an auxiliary video and a main video.
- the step 42 may specifically include: Step 53: Obtain depth information of the audio signal according to the direction information of the audio signal and the auxiliary video;
- Step 54 Acquire coordinate information of the audio signal in the display site according to the depth information and the direction information of the audio signal;
- Step 55 Generate distance information of the audio signal corresponding to the position where the viewpoint is located, according to the coordinate information of the audio signal and the position information of the viewpoint.
- the method may further include the following steps:
- Step 50 Convert the direction information of the auxiliary video and the audio signal to the same coordinate system according to the position information of the microphone array and the camera group.
- Step 56 processing the main video and the auxiliary video by using a three-dimensional video display method to obtain a display video signal;
- Step 57 Play the display video signal.
- the received encoded data is decoded, the direction information of the audio signal and the audio signal, the auxiliary video and the main video are obtained, and the distance information of the audio signal is obtained according to the direction information of the audio signal and the auxiliary video, according to the direction information of the audio signal.
- the distance information of the audio signal is processed to obtain the speaker signal, and then the speaker information is played, and the position information of the audio signal, including the direction information, is accurately obtained by combining the three-dimensional video signal and the three-dimensional audio signal without increasing the volume of the microphone array.
- the distance information further enables audio signal playback.
- the main video and the auxiliary video are processed to obtain a display video signal, and then the display video signal is played, thereby realizing the playback of the video signal to achieve the combination of the video signal and the audio signal.
- FIG. 18 it is a schematic structural diagram of Embodiment 1 of an audio signal playing device according to the present invention, which may specifically include: an audio signal decoding module 316, a distance information acquiring module 317 of a receiving end audio signal, a speaker signal acquiring module 318, and a speaker signal playing.
- the module 319, the distance information acquiring module 317 of the receiving end audio signal is connected to the audio signal decoding module 316, and the speaker signal acquiring module 318 is respectively connected with the audio signal decoding module 316 and the distance information acquiring module 317 of the receiving end audio signal, and the speaker signal playing module is connected.
- 319 is coupled to the speaker signal acquisition module 318.
- the audio signal decoding module 316 is configured to decode the received encoded data to obtain direction information of the audio signal and the audio signal; the distance information acquiring module 317 of the receiving end audio signal is configured to obtain the distance information of the audio signal; The module 318 is configured to receive the direction information of the audio signal and the audio signal from the audio signal decoding module 316, and receive the distance information of the audio signal from the distance information acquiring module 31 7 of the receiving end audio signal, according to the direction information of the audio signal and the distance of the audio signal.
- Letter The audio signal is processed by the audio signal reproducing method to obtain a speaker signal corresponding to each speaker; the speaker signal playing module 319 is used to play the speaker signal using the speaker array or the surround sound system.
- the speaker signal playback module 319 such as a speaker array, provides the functionality of the speaker signal acquisition module 318, the speaker signal acquisition module 318 is no longer needed.
- the audio signal decoding module 316 decodes the received encoded data to obtain direction information of the audio signal and the audio signal
- the distance information acquiring module 317 of the receiving end audio signal acquires the distance information of the audio signal
- the speaker signal acquiring module 318 The audio signal is processed according to the direction information of the audio signal and the distance information of the audio signal to obtain a speaker signal
- the speaker signal playing module 319 plays the speaker signal, thereby being able to combine the three-dimensional video signal and the three-dimensional without increasing the volume of the microphone array.
- the audio signal accurately obtains the position information of the audio signal, including the direction information and the distance information, and further realizes the audio signal playing.
- FIG. 19 it is a schematic structural diagram of the second embodiment of the audio signal playing device of the present invention.
- the distance information acquiring module 317 of the receiving end audio signal can specifically decode the distance information of the audio signal.
- the module 320 is configured to decode the received encoded data to obtain distance information of the audio signal.
- the embodiment may further include a receiving end communication interface 321 for receiving encoded data sent over the network, and transmitting the encoded data to the audio signal decoding module 316.
- the audio signal decoding module 316 decodes the received encoded data to obtain direction information of the audio signal and the audio signal, and decodes the received encoded data by the distance information decoding module 320 of the audio signal to obtain the distance of the audio signal.
- the information, the speaker signal acquisition module 31 8 processes the audio signal according to the direction information of the audio signal and the distance information of the audio signal to obtain a speaker signal, and the speaker signal playing module 319 plays the speaker signal again, so that the volume of the microphone array can be increased without increasing the volume of the microphone array.
- the received encoded data is decoded, and the position information of the audio signal is accurately obtained, including direction information and distance information, to further implement audio signal playback.
- FIG. 20 is a schematic structural diagram of Embodiment 3 of an audio signal playback apparatus according to the present invention.
- the first video signal decoding module 322 is further configured to decode the received encoded data to obtain an auxiliary video.
- the distance information acquiring module 317 of the receiving end audio signal may be a distance information acquiring module 31 of the audio signal, and is connected to the audio signal decoding module 316 and the first video signal decoding module 322, and is used for the structure diagram shown in FIG.
- the distance information of the audio signal is generated based on the direction information of the audio signal and the auxiliary video.
- the distance information acquiring module 31 of the audio signal may specifically include a depth information acquiring unit 311, a coordinate information acquiring unit 312, and a distance information acquiring unit 313.
- the coordinate information acquiring unit 312 is connected to the depth information acquiring unit 311, and the distance information acquiring unit 31 3
- the coordinate information acquiring unit 312 is connected.
- the depth information acquiring unit 31 1 is configured to obtain the depth information of the audio signal according to the direction information of the audio signal and the auxiliary video.
- the coordinate information acquiring unit 312 is configured to obtain the audio signal according to the depth information and the direction information of the audio signal.
- the coordinate information in the coordinate information acquisition unit 313 is configured to generate distance information of the audio signal corresponding to the position where the viewpoint is located, according to the coordinate information of the audio signal and the position information of the viewpoint.
- the distance information acquisition module 31 of the audio signal may further include: a coordinate transformation unit 314 connected to the first video signal decoding module 322 and the audio signal decoding module 316 for using the auxiliary video and audio signals according to the position information of the microphone array and the camera group.
- the direction information is converted to the same coordinate system, and the direction information of the coordinate-converted auxiliary video and audio signal is transmitted to the depth information acquiring unit 311, and the direction information of the coordinate-converted audio signal is transmitted to the coordinate information acquiring unit 312.
- the embodiment may further include a receiving end communication interface 321 for receiving encoded data sent over the network, and transmitting the encoded data to the audio signal decoding module 316 and the first video signal decoding module. 322.
- the audio signal decoding module 316 decodes the received encoded data to obtain direction information of the audio signal and the audio signal
- the distance information acquiring module 31 of the audio signal generates the distance of the audio signal according to the direction information of the audio signal and the auxiliary video.
- the information, the speaker signal acquisition module 318 processes the audio signal according to the direction information of the audio signal and the distance information of the audio signal to obtain a speaker signal
- the speaker signal playing module 319 plays the speaker signal again, thereby not increasing
- the position information of the audio signal including the direction information and the distance information, is accurately obtained by combining the three-dimensional video signal and the three-dimensional audio signal, thereby further realizing the audio signal playing.
- FIG. 21 is a schematic structural diagram of Embodiment 4 of an apparatus for playing audio signals according to the present invention.
- the second video signal decoding module 323, the video output signal processing module 324, and the video output may be further included.
- the module 325, the video output signal processing module 324 is coupled to the second video signal decoding module 323, and the video output module 325 is coupled to the video output signal processing module 324.
- the second video signal decoding module 323 is configured to decode the received encoded data to obtain an auxiliary video and a main video.
- the video output signal processing module 324 is configured to process the main video and the auxiliary video by using a three-dimensional video display method. A display video signal is obtained; the video output module 325 is configured to play a display video signal.
- the distance information acquiring module 317 of the receiving end audio signal may be specifically a distance information acquiring module 31 of the audio signal, and is connected to the audio signal decoding module 31 6 and the second video signal decoding module 32 3, based on the structure diagram shown in FIG. Distance information for generating an audio signal based on direction information of the audio signal and the auxiliary video.
- the distance information acquiring module 31 of the audio signal may specifically include a depth information acquiring unit 31 1 , a coordinate information acquiring unit 312 , and a distance information acquiring unit 31 3 .
- the coordinate information acquiring unit 312 is connected to the depth information acquiring unit 31 1 , and the distance information acquiring unit 31 is connected. 3 is connected to the coordinate information acquiring unit 31 2 .
- the depth information acquiring unit 31 1 is configured to obtain the depth information of the audio signal according to the direction information of the audio signal and the auxiliary video.
- the coordinate information acquiring unit 312 is configured to obtain the audio signal according to the depth information and the direction information of the audio signal.
- the coordinate information in the coordinate information acquisition unit 313 is configured to generate distance information of the audio signal corresponding to the position where the viewpoint is located, according to the coordinate information of the audio signal and the position information of the viewpoint.
- the distance information acquisition module 31 of the audio signal may further include: a coordinate transformation unit 314 connected to the audio signal decoding module 316 and the second video signal decoding module 323 for using the auxiliary video and audio signals according to the position information of the microphone array and the camera group
- the direction information is converted to the same coordinate system, and the direction information of the coordinate-converted auxiliary video and audio signals is sent to the depth information acquiring unit 31 1 , and the direction information of the coordinate-transformed audio signals is sent to the coordinate information acquiring unit. 312.
- the embodiment may further include a receiving end communication interface 321 for receiving encoded data sent over the network, and transmitting the encoded data to the audio signal decoding module 316 and the second video signal decoding module. 323.
- Video output module 325 is typically a stereoscopic display. If the stereoscopic display provides the functionality of video output signal processing module 324, video output signal processing module 324 will no longer be required in this case.
- the audio signal decoding module 316 decodes the received encoded data to obtain direction information of the audio signal and the audio signal
- the distance information acquiring module 31 of the audio signal generates the distance of the audio signal according to the direction information of the audio signal and the auxiliary video.
- the information, the speaker signal acquisition module 318 processes the audio signal according to the direction information of the audio signal and the distance information of the audio signal to obtain a speaker signal
- the speaker signal playing module 319 plays the speaker signal, so that the volume of the microphone array can be increased without increasing the size of the microphone array.
- Combine the three-dimensional video signal and the three-dimensional audio signal to accurately obtain the position information of the audio signal, including the direction information and the distance information, and further realize the playing of the audio signal.
- the video output signal processing module 324 processes the main video and the auxiliary video by using the three-dimensional video display method to obtain a display video signal, and the video output module 325 plays the display video signal, thereby realizing the playback of the video signal to reach the video signal and the audio.
- the combination of signals are the video output signal processing module 324 and the auxiliary video by using the three-dimensional video display method to obtain a display video signal, and the video output module 325 plays the display video signal, thereby realizing the playback of the video signal to reach the video signal and the audio.
- FIG. 22 it is a schematic structural diagram of an embodiment of a processing system for an audio signal according to the present invention.
- the audio signal processing system 329 may specifically include an audio signal generating device 327 and an audio signal playing device 328.
- the audio signal generating device 327 may specifically include a distance information acquiring module 31 and an audio signal encoding module 32 of the audio signal, and the audio signal encoding module 32 is connected to the distance information acquiring module 31 of the audio signal.
- the distance information acquiring module 31 of the audio signal is configured to generate distance information of the audio signal corresponding to the position where the viewpoint is located according to the direction information of the acquired audio signal and the auxiliary video, where the auxiliary video is a disparity map or a depth map;
- the module 32 encodes and transmits the audio signal, the direction information of the audio signal, and the distance information of the audio signal.
- the audio signal playing device 328 may specifically include an audio signal decoding module 316, a distance information acquiring module 317 of the receiving end audio signal, a speaker signal acquiring module 318 and a speaker signal playing module 319, a distance information acquiring module 317 of the receiving end audio signal and the audio.
- the signal decoding module 316 is connected, and the speaker signal acquiring module 318 is connected to the audio signal decoding module 316 and the distance information acquiring module 317 of the receiving end audio signal, respectively, and the speaker signal playing module 319 is connected to the speaker signal acquiring module 318.
- the audio signal decoding module 316 is configured to decode the received encoded data to obtain direction information of the audio signal and the audio signal; the distance information acquiring module 317 of the receiving end audio signal is configured to obtain the distance information of the audio signal; The module 318 is configured to process the audio signal according to the direction information of the audio signal and the distance information of the audio signal by using an audio signal reproduction method to obtain a speaker signal corresponding to each speaker; the speaker signal playing module 319 is configured to use the speaker array or surround The stereo system plays the speaker signal.
- the embodiment may further include an echo cancellation module 320 coupled to the audio signal generating means 327 and the audio signal playing means 328 for canceling the echo.
- the position information of the audio signal can be accurately obtained by combining the three-dimensional video signal and the three-dimensional audio signal without increasing the volume of the microphone array, thereby further realizing the transmission and playback of the audio signal.
- the foregoing storage medium includes: a medium that can store program codes, such as a ROM, a RAM, a magnetic disk, or an optical disk.
Landscapes
- Engineering & Computer Science (AREA)
- Signal Processing (AREA)
- Multimedia (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Mathematical Physics (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
- Stereophonic System (AREA)
- Circuit For Audible Band Transducer (AREA)
Description
音频信号的生成、 播放方法及装置、 处理系统
本申请要求于 2008 年 08 月 27 日提交中国专利局、 申请号为 200810119140.5、发明名称为"音频信号的生成、 播放方法及装置、 处理系统" 的中国专利申请的优先权, 其全部内容通过引用结合在本申请中。 技术领域
本发明涉及通信技术领域, 尤其涉及一种音频信号的生成、 播放方法及 装置、 处理系统。 背景技术
三维视频技术可以提供符合立体视觉原理的具有深度信息的画面, 三维 音频技术拾音时使用麦克风阵列, 用波束形成等方法可获得增强后的声音以 及声音的方向和距离等信息; 重放时使用扬声器阵列, 用波前合成等方法, 可以重现具有方向感和距离感的声音。 现有技术中已经有一些关于三维视频 或者三维音频的实验性系统。
如图 1A所示, 为现有技术中原始的会场布置图对应的水平视图, 共 7 人参加, 其中, 会议参加者 P1位于前排, 会议参加者 P2位于后排。 如图 1B 所示, 为现有技术中在重现会场中的屏幕上显示的图 1A会场中的场景示意 图, 如果重现会场中的一个会议参加者位于 0点, 注意到 0点、 P1所在位 置和 P2所在位置正好位于一条直线上, 如果在进行声场重现时, 不对重现声 场的声音的距离进行处理或者处理的不好, P1和 P2的声音与位置不相匹配, 那么在 P1或 /和 P2讲话时, 就会干扰位于 0点的会议参加者辨别是 P1在讲 话还是 P2在讲话。 另外当以三维视频的方式进行场景重现时, 也会遇到类似 的问题。 如图 2所示, 为现有技术中某一会场布置俯视图, 根据立体视觉的 成像和显示原理, 在运用三维显示技术将在会场 1 中的一个物体在另外一个 会场 2中显示时, 可以根据需要让会场 2中的会议参加者看起来是在显示屏 21之前, 如位置 C所在位置, 或者显示屏之后, 如位置 B所在位置。 假定图
2中的物体为会场 1 中的一个会议参加者, 在会场 1 中对应的位置为 A; 在 会场 2中重现时, 如果以在显示屏之前的位置, 如位置 C处显示, 而声音是 从 B处发送出来的, 那么这样也会影响会场 2中的会议参加者与会场 1中的 会议参加者的沟通与交流。
在完成本发明的过程中, 本发明人发现: 在现有技术中, 为了获得更准 确的声音的方向和距离, 大都采用增加部署在麦克风阵列中麦克风的个数或 / 和增加麦克风之间的间距。 对于麦克风阵列, 麦克风阵列中部署的麦克风的 个数越多, 麦克风之间的间距越大, 声音的方向和距离判断就越准确, 但麦 克风阵列的体积也随之增大。 而减少麦克风的个数, 减小麦克风之间的间距, 麦克风阵列获得声音的方向和距离的, 尤其是距离的, 准确性就会降低, 这 在重放时需考虑声音的距离的场景中, 例如在允许说话人可自由移动, 或会 场布置为如图 1A所示的多排位置的会议系统或如图 1B所示的三维视频显示 系统中, 使得听者不能及时并准确地判断出说话人的位置, 从而影响眼对眼 (Eye to Eye)交流的效果。 发明内容
本发明实施例提供一种音频信号的生成、 播放方法及装置、 处理系统, 在不增加麦克风阵列体积的情况下, 获得更准确地音频信号的位置信息, 包 括方向信息和距离信息。
本发明实施例提供了一种音频信号的生成方法, 包括:
根据获取的音频信号的方向信息和辅助视频, 生成视点所处位置对应的 音频信号的距离信息, 其中所述辅助视频为视差图或深度图;
将音频信号、 所述音频信号的方向信息以及所述音频信号的距离信息进 行编码并发送。
本发明实施例提供了一种音频信号的生成装置, 包括:
音频信号的距离信息获取模块, 用于根据获取的音频信号的方向信息和 辅助视频, 生成视点所处位置对应的音频信号的距离信息, 其中所述辅助视 频为视差图或深度图;
音频信号编码模块, 将音频信号、 所述音频信号的方向信息以及所述音 频信号的距离信息进行编码并发送。
本发明实施例提供了一种音频信号的播放方法, 包括:
将接收到的编码数据进行解码, 获得音频信号和音频信号的方向信息; 获取音频信号的距离信息;
根据所述音频信号的方向信息以及所述音频信号的距离信息, 利用音频 信号重现方法对所述音频信号进行处理, 得到与各个扬声器对应的扬声器信 号;
使用扬声器阵列或者环绕立体声系统播放所述扬声器信号。
本发明实施例提供了一种音频信号的播放装置, 包括:
音频信号解码模块, 用于将接收到的编码数据进行解码, 获得音频信号 和音频信号的方向信息;
接收端音频信号的距离信息获取模块, 用于获取音频信号的距离信息; 扬声器信号获取模块, 用于从所述音频信号解码模块接收所述音频信号 和所述音频信号的方向信息, 从所述接收端音频信号的距离信息获取模块接 收所述音频信号的距离信息, 根据所述音频信号的方向信息以及所述音频信 号的距离信息, 利用音频信号重现方法对所述音频信号进行处理, 得到与各 个扬声器对应的扬声器信号;
扬声器信号播放模块, 用于使用扬声器阵列或者环绕立体声系统播放所 述扬声器信号。
本发明实施例提供了一种音频信号的处理系统, 包括音频信号的生成装 置和音频信号的播放装置;
其中, 音频信号的生成装置包括音频信号的距离信息获取模块, 用于根 据获取的音频信号的方向信息和辅助视频, 生成视点所处位置对应的音频信 号的距离信息, 其中所述辅助视频为视差图或深度图; 音频信号编码模块, 将音频信号、 所述音频信号的方向信息以及所述音频信号的距离信息进行编 码并发送;
音频信号的播放装置包括音频信号解码模块, 用于将接收到的编码数据
进行解码, 获得音频信号和音频信号的方向信息; 接收端音频信号的距离信 息获取模块, 用于获取音频信号的距离信息; 扬声器信号获取模块, 用于根 据所述音频信号的方向信息以及所述音频信号的距离信息, 利用音频信号重 现方法对所述音频信号进行处理, 得到与各个扬声器对应的扬声器信号; 扬 声器信号播放模块, 用于使用扬声器阵列或者环绕立体声系统播放所述扬声 器信号。
本发明实施例能够在不增加麦克风阵列体积的情况下, 结合三维视频信 号和三维音频信号准确获得音频信号的位置信息, 包括方向信息和距离信息, 进一步实现音频信号的发送和播放。
附图说明 施例或现有技术描述中所需要使用的附图作简单地介绍, 显而易见地, 下面 描述中的附图仅仅是本发明的一些实施例, 对于本领域普通技术人员来讲, 在不付出创造性劳动性的前提下, 还可以根据这些附图获得其他的附图。
图 1 A为现有技术中原始的会场布置图对应的水平视图;
图 1B为现有技术中在重现会场中的屏幕上显示的图 1A会场中的场景示 意图;
图 2为现有技术中某一会场布置俯视图;
图 3为本发明音频信号的生成方法实施例一流程示意图;
图 4为本发明音频信号的生成方法实施例二流程示意图;
图 5为本发明音频信号的生成方法实施例二中计算音频信号在辅助视频 中的横坐标的示意图;
图 6为本发明音频信号的生成方法实施例二中计算音频信号在辅助视频 中的纵坐标的示意图;
图 7为本发明音频信号的生成方法实施例二中平行摄像机系统下图像视 差和深度以及视点离显示器的距离的关系示意图;
图 8为本发明音频信号的生成方法实施例二中计算音频信号的距离的 XZ 平面示意图;
图 9为本发明音频信号的生成方法实施例二中计算音频信号的距离的 YZ 平面示意图;
图 10为本发明音频信号的生成方法实施例三流程示意图;
图 11为本发明音频信号的生成装置实施例一结构示意图;
图 12为本发明音频信号的生成装置实施例二结构示意图;
图 13为本发明音频信号的生成装置实施例三结构示意图;
图 14为本发明音频信号的播放方法实施例一流程示意图;
图 15为本发明音频信号的播放方法实施例二流程示意图;
图 16为本发明音频信号的播放方法实施例三流程示意图;
图 17为本发明音频信号的播放方法实施例四流程示意图;
图 18为本发明音频信号的播放装置实施例一结构示意图;
图 19为本发明音频信号的播放装置实施例二结构示意图;
图 20为本发明音频信号的播放装置实施例三结构示意图;
图 21为本发明音频信号的播放装置实施例四结构示意图;
图 22为本发明音频信号的处理系统实施例结构示意图。 具体实施方式
下面将结合本发明实施例中的附图, 对本发明实施例中的技术方案进行 清楚、 完整地描述, 显然, 所描述的实施例仅仅是本发明一部分实施例, 而 不是全部的实施例。 基于本发明中的实施例, 本领域普通技术人员在没有作 出创造性劳动前提下所获得的所有其他实施例 , 都属于本发明保护的范围。
音频信号的生成方法实施例一
如图 3所示, 为本发明音频信号的生成方法实施例一流程示意图, 具体 可以包括如下步骤:
步骤 11、 根据获取的音频信号的方向信息和辅助视频, 生成视点所处位 置对应的音频信号的距离信息, 其中辅助视频为视差图或深度图;
步骤 12、 将音频信号、 音频信号的方向信息以及音频信号的距离信息进 行编码并发送。
本实施例能够在不增加麦克风阵列体积的情况下, 根据获取的音频信号 的方向信息和辅助视频, 准确获得音频信号的距离信息, 进而实现音频信号 的发送。
音频信号的生成方法实施例二
如图 4所示, 为本发明音频信号的生成方法实施例二流程示意图, 在图 3所示技术方案基 上, 步骤 11之前具体可以包括如下步骤:
步骤 21、 麦克风阵列捕捉至少两路音频信号作为输入音频流;
步骤 22、 利用麦克风阵列处理方法处理输入音频流, 获得增强后的音频 信号以及音频信号的方向信息;
步骤 23、 摄像机组捕捉至少两路视频信号作为输入视频流;
步骤 24、 根据输入视频流, 获得主视频和辅助视频。
可选的, 在图 3所示技术方案基础上, 步骤 11具体可以包括如下步骤: 步骤 26、 根据音频信号的方向信息和辅助视频, 获取音频信号的深度信 息;
步骤 27、 根据深度信息和音频信号的方向信息, 获取音频信号在显示会 场中的坐标信息;
步骤 28、 根据音频信号的坐标信息和视点的位置信息, 生成视点所处位 置对应的音频信号的距离信息。
可选的, 步骤 26之前还可以包括如下步骤:
步骤 25、 根据麦克风阵列和摄像机组的位置信息将辅助视频和音频信号 的方向信息换算到同一坐标系下。
可选的, 其中, 步骤 26具体可以包括如下步骤:
步骤 261、 根据音频信号的方向信息获取音频信号在辅助视频中的坐标, 判断辅助视频为深度图还是视差图; 如果辅助视频为深度图, 执行步骤 262 ; 如果辅助视频为视差图, 执行步骤 263;
步骤 262、 根据坐标直接从深度图中获取音频信号对应的深度信息;
步骤 263、 根据坐标从视差图中获取音频信号对应的视差, 根据视差计 算得到音频信号对应的深度信息。
可选的, 步骤 21具体可以包括如下步骤:
步骤 21 1、 麦克风阵列捕捉至少两路音频信号作为第一输入音频流, 各 路音频信号为多个音源的声音组成的混合音频信号;
步骤 212、 使用音频信号分离方法分离第一输入音频流中的各路音频信 号, 分别获取每个音源的声音对应的音频信号, 将每个音源的声音对应的音 频信号组成输入音频流。
在步骤 21和步骤 22中, 最为简单的情况, 麦克风阵列由两个麦克风组 成, 因此输入音频流最少包含两路音频信号, 然后使用麦克风阵列处理方法, 例如波束形成方法, 处理输入音频流获得增强后的音频信号以及音频信号的 方向信息。
在步骤 2 3和步骤 24中, 最为简单的情况, 摄像机组由两个摄像机组成, 因此输入视频流最少包含两路视频信号。 然后根据输入视频流获得主视频和 辅助视频, 在存在两个摄像机的情况下, 可以获得一个辅助视频; 如果一个 摄像机组有两个以上的摄像机组成, 则可以获得多个辅助视频。 同时, 选择 输入视频流中的一个或者多个视频流作为主视频, 最为简单的情况, 在有两 个摄像机存在的情况下, 取其中一个摄像机捕捉到的视频为主视频。
需要说明的是, 步骤 21、 步骤 22与步骤 2 3、 步骤 24之间没有严格的时 序关系, 即在具体的实现中, 也可以先执行步骤 23和步骤 24 , 然后再执行 步骤 21 和步骤 22 , 因此任何顺序的调换, 都能够达到本发明实施例的技术 效果。
下面讲述本实施例步骤 26-28获取音频信号的距离信息的具体过程。 如 图 5所示, 为本发明音频信号的生成方法实施例二中计算音频信号在辅助视 频中的横坐标的示意图, 图中原点 0点对应摄像机镜头的中心, z轴沿摄像机 镜头垂直方向, ^轴与 轴组成的平面与 z轴垂直。 空间点 所在平面为会场 中音源点 ρι所在且与 z轴相垂直的平面, 该平面与 O点沿 z轴的距离, 即物距 为 空间点 所在平面为音源点的成像点 A所在且与 z轴相垂直的平面, 它
与 O点沿 z轴的距离, 即像距等于相机的焦距/。 令音源点 到^轴和^轴的 距离分别为 和 ; 令音源点 P\经过摄像机的成像点 A点到 轴和 Λ轴的距 离分别为 和 w。 麦克风阵列测得 0点与 S点的矢量^在; S平面上的投影与 z轴的夹角为 Z« , 那么才艮据直角三角形的性质, 可以得到 w为: 如图 6所示, 为本发明音频信号的生成方法实施例二中计算音频信号在 辅助视频中的纵坐标的示意图,麦克风阵列测得 0点与 S点的矢量 ^在! ^平 面上的投影与 z轴的夹角为 Z , 那么根据直角三角形的性质, 可以得到 为: h = f - ta ( ) 使用公式(1 ) 和公式 (2 ) , 即可得到音源点 S对应的成像点 P2的坐标 ( w , )。 由于辅助视频和摄像机成像的大小和位置信息均相同, 因此成像点 P2的坐标( w, // ) 即为音源点 S在辅助视频上的对应点的坐标。
如果辅助视频为深度图, 根据坐标直接从深度图中获取音源点 ^对应的 深度信息。
其中, ^表示深度, p表示视差, 表示视点离显示器的距离, 表示 人的两目艮之间的距离。
下面介绍公式(3 )的推导过程。 如图 7所示, 为本发明音频信号的生成 方法实施例二中平行摄像机系统下图像视差和深度以及视点离显示器的距离 的关系示意图, 系统的原点坐标 0位于显示屏上, Z轴朝向视点, 轴对应显 示屏, ^表示深度, p表示视差, D表示视点离显示器的距离, ¾表示人的 两目艮之间的距离, 人的左右眼所在位置对应在坐标系中的坐标分别为(0, D) , (xB , D) , 音频信号的位置为( , ), ¾和¾分别指视点在左眼和右目艮视图中在 显示屏中的坐标, 而 ¾和¾之间的距离 p即为视差。
通过简单的几何关系可以得到:
XL _ P 和 XR ~XB _ XB
D D-zp D D
上面两式联立得到:
令 p = -¾, 可得视差 p和深度 zp关系如下:
进一步表示为: χΒ-ρ
如图 8所示, 为本发明音频信号的生成方法实施例二中计算音频信号的 距离的 xz平面示意图, 为音频信号的坐标信息, 其中深度 ^已经得 出, 麦克风阵列测得坐标原点 O点与 点形成的矢量 在 ^平面上的 投影与 ζ轴的夹角为 ζ« , 则可以通过下式计算音频信号的横坐标 Χρ:
xp=zp -tan(«)
这样, 获取音频信号的坐标信息 (xp, )后, 定位会场中的一个视点位于 (0,D)点后, 在 Z平面上, 音频信号的距离信息即为 距离 P对应的矢量为 VP。
如图 9所示, 为本发明音频信号的生成方法实施例二中计算音频信号的 距离的 γζ平面示意图,
为音频信号的坐标信息, 麦克风阵列测得坐 标原点 0点与 点形成的矢量^在; ^平面上的投影与 z轴的夹角为 Ζβ, 则可以通过下式计算音频信号在显示会场中的纵坐标:
这样, 获取音频信号的坐标信息 后, 定位会场中的一个视点位于 (o,D)点后, 在 平面上, 音频信号的距离信息即为 距离 p对应的矢量为
VP。
可选的, 本实施例还可以包括如下步骤:
步骤 210、 将辅助视频进行编码并发送。
本实施例分别利用麦克风阵列和摄像机组获得输入音频流和输入视频
流 ,然后 居输入音频流和输入视频流获得音频信号的方向信息和辅助视频 , 再根据音频信号的方向信息和辅助视频计算音频信号的距离信息, 能够在不 增加麦克风阵列体积的情况下, 结合三维视频信号和三维音频信号准确获得 音频信号的位置信息, 包括方向信息和距离信息, 进而实现音频信号的发送。
进一步地, 本实施例能够在麦克风阵列所在环境中存在多个非噪声音源 时, 例如多人在同时讲话时, 使用音频信号分离方法从麦克风阵列捕捉的第 一输入音频流中分离出每个音源的声音对应的音频信号, 将每个音源的声音 对应的音频信号组成输入音频流, 继续对输入音频流进行处理, 准确地得到 每个音源的声音对应的音频信号的位置信息, 包括方向信息和距离信息。
进一步地, 考虑到在实际布置系统时, 麦克风阵列测量声源方向时所采 用坐标系并不一定和摄像系统的坐标系重合, 因此, 需要对这两个坐标系进 行变换, 使得计算均在同一坐标系下进行。
本发明实施例能够在不增加麦克风阵列体积的情况下, 结合三维视频信 号和三维音频信号准确获得音频信号的位置信息, 包括方向信息和距离信息, 进一步实现音频信号的发送和播放。
音频信号的生成方法实施例三
可选的, 如图 10所示, 为本发明音频信号的生成方法实施例三流程示意 图, 在图 4所示技术方案基石出上, 步骤 210还可以为:
步骤 21 3、 将主视频和辅助视频进行编码并发送。
本实施例能够在不增加麦克风阵列体积的情况下, 结合三维视频信号和 三维音频信号准确获得音频信号的位置信息, 包括方向信息和距离信息, 进 一步实现音频信号和视频信号的发送。
音频信号的生成装置实施例一
如图 11所示, 为本发明音频信号的生成装置实施例一结构示意图, 具体 可以包括音频信号的距离信息获取模块 31 和音频信号编码模块 32 , 音频信 号编码模块 32与音频信号的距离信息获取模块 31连接。 其中, 音频信号的 距离信息获取模块 31用于根据获取的音频信号的方向信息和辅助视频,生成 视点所处位置对应的音频信号的距离信息,其中辅助视频为视差图或深度图;
音频信号编码模块 32用于将音频信号、音频信号的方向信息以及音频信号的 距离信息进行编码并发送。
本实施例中音频信号的距离信息获取模块 31 根据获取的音频信号的方 向信息和辅助视频生成音频信号的距离信息,音频信号编码模块 32将音频信 号、 音频信号的方向信息以及音频信号的距离信息进行编码并发送, 从而能 够在不增加麦克风阵列体积的情况下, 结合三维视频信号和三维音频信号准 确获得音频信号的位置信息, 包括方向信息和距离信息, 进一步实现音频信 号的发送。
音频信号的生成装置实施例二
如图 12所示, 为本发明音频信号的生成装置实施例二结构示意图, 在图 1 1 所示结构示意图基石出上, 音频信号的距离信息获取模块 31具体可以包括 深度信息获取单元 31 1、 坐标信息获取单元 312和距离信息获取单元 31 3 , 坐 标信息获取单元 312与深度信息获取单元 31 1连接, 距离信息获取单元 31 3 与坐标信息获取单元 312连接, 其中深度信息获取单元 31 1用于根据音频信 号的方向信息和辅助视频,获取音频信号的深度信息; 坐标信息获取单元 312 用于根据深度信息和音频信号的方向信息, 获取音频信号在显示会场中的坐 标信息; 距离信息获取单元 31 3用于根据音频信号的坐标信息和视点的位置 信息, 生成视点所处位置对应的音频信号的距离信息, 将音频信号的距离信 息发送给音频信号编码模块 32。
本实施例还可以包括麦克风阵列 33、 音频输入信号处理模块 34、视频采 集模块 35和视频输入信号处理模块 36 , 音频输入信号处理模块 34与麦克风 阵列 33连接, 视频输入信号处理模块 36与视频采集模块 35连接。 其中, 麦 克风阵列 33用于麦克风阵列捕捉至少两路音频信号作为输入音频流;音频输 入信号处理模块 34用于利用麦克风阵列处理方法处理输入音频流,获得增强 后的音频信号以及音频信号的方向信息, 将音频信号和音频信号的方向信息 发送给音频信号编码模块 32 ; 视频采集模块 35用于摄像机组捕捉至少两路 视频信号作为输入视频流; 视频输入信号处理模块 36用于根据输入视频流, 获得主视频和辅助视频。
麦克风阵列 33具体可以包括麦克风阵列单元 330和音频信号分离单元 331。 其中, 麦克风阵列单元 330用于麦克风阵列捕捉至少两路音频信号作为 第一输入音频流, 各路音频信号为多个音源的声音组成的混合音频信号; 音 频信号分离单元 331用于使用音频信号分离方法分离第一输入音频流中的各 路音频信号, 分别获取每个音源的声音对应的音频信号, 将每个音源的声音 对应的音频信号组成输入音频流, 将输入音频流发送给音频输入信号处理模 块 34。
可选的,音频信号的距离信息获取模块 31还可以包括坐标变换单元 314 , 与视频输入信号处理模块 36和音频输入信号处理模块 34连接, 用于根据麦 克风阵列和摄像机组的位置信息将辅助视频和音频信号的方向信息换算到同 一坐标系下, 将坐标变换后的辅助视频和音频信号的方向信息发送给深度信 息获取单元 311 , 将坐标变换后的音频信号的方向信息发送给坐标信息获取 单元 312。
可选的, 本实施例还可以包括第一视频编码模块 38 , 与视频输入信号处 理模块 36连接, 用于将辅助视频进行编码并发送。
可选的, 本实施例还可以包括发送端通信接口 39 , 与音频信号编码模块 32、 第一视频编码模块 38连接, 用于将编码数据通过网络进行发送。
视频采集模块 35通常使用两个摄像机组成的摄像机组来拍摄场景,也有 可能采用能直接得出深度信息的深度摄像机来直接获得深度信息, 在此情况 下将不再需要视频输入信号处理模块 36。 如果麦克风阵列 33提供了音频输 入信号处理模块 34 的功能, 在此情况下将不再需要音频输入信号处理模块 34。
本实施例中音频信号的距离信息获取模块 31 根据获取的音频信号的方 向信息和辅助视频生成音频信号的距离信息,音频信号编码模块 32将音频信 号、 音频信号的方向信息以及音频信号的距离信息进行编码并发送, 第一视 频编码模块 38将辅助视频进行编码并发送,从而能够在不增加麦克风阵列体 积的情况下, 结合三维视频信号和三维音频信号准确获得音频信号的位置信 息, 包括方向信息和距离信息, 进一步实现音频信号和辅助视频的发送。
进一步地, 考虑到在实际布置系统时, 麦克风阵列测量声源方向时所采 用坐标系并不一定和摄像系统的坐标系重合, 因此坐标变换单元 314对这两 个坐标系进行变换, 使得计算均在同一坐标系下进行。
本发明实施例能够在不增加麦克风阵列体积的情况下, 结合三维视频信 号和三维音频信号准确获得音频信号的位置信息, 包括方向信息和距离信息, 进一步实现音频信号的发送和播放。
音频信号的生成装置实施例三
如图 1 3所示, 为本发明音频信号的生成装置实施例三结构示意图, 在图 12所示结构示意图基石出上, 第一视频编码模块 38还可以为第二视频编码模 块 315 , 用于将主视频和辅助视频进行编码并发送。
在图 12所示结构示意图基石出上, 发送端通信接口 39与音频信号编码模 块 32和第二视频编码模块 315连接。
本实施例中音频信号的距离信息获取模块 31 根据获取的音频信号的方 向信息和辅助视频生成音频信号的距离信息,音频信号编码模块 32将音频信 号、 音频信号的方向信息以及音频信号的距离信息进行编码并发送, 第二视 频编码模块 315将主视频和辅助视频进行编码并发送, 从而能够在不增加麦 克风阵列体积的情况下, 结合三维视频信号和三维音频信号准确获得音频信 号的位置信息, 包括方向信息和距离信息, 进一步实现音频信号和视频信号 的发送。
进一步地, 考虑到在实际布置系统时, 麦克风阵列测量声源方向时所采 用坐标系并不一定和摄像系统的坐标系重合, 因此坐标变换单元 314对这两 个坐标系进行变换, 使得计算均在同一坐标系下进行。
本发明实施例能够在不增加麦克风阵列体积的情况下, 结合三维视频信 号和三维音频信号准确获得音频信号的位置信息, 包括方向信息和距离信息, 进一步实现音频信号的发送和播放。
音频信号的播放方法实施例一
如图 14所示, 为本发明音频信号的播放方法实施例一流程示意图, 具体 可以包括如下步骤:
步骤 41、 将接收到的编码数据进行解码, 获得音频信号和音频信号的方 向信息;
步骤 42、 获取音频信号的距离信息;
步骤 43、 根据音频信号的方向信息以及音频信号的距离信息, 利用音频 信号重现方法对音频信号进行处理, 得到与各个扬声器对应的扬声器信号; 步骤 44、 使用扬声器阵列或者环绕立体声系统播放扬声器信号。
本实施例将接收到的编码数据进行解码, 获得音频信号和音频信号的方 向信息, 获取音频信号的距离信息, 根据音频信号的方向信息和音频信号的 距离信息对音频信号进行处理, 得到扬声器信号, 再播放扬声器信号, 从而 能够在不增加麦克风阵列体积的情况下, 结合三维视频信号和三维音频信号 准确获得音频信号的位置信息, 包括方向信息和距离信息, 进一步实现音频 信号播放。
音频信号的播放方法实施例二
可选的, 如图 15所示, 为本发明音频信号的播放方法实施例二流程示意 图, 在图 14所示技术方案基石出上, 步骤 42具体可以包括:
步骤 421、 将接收到的编码数据进行解码获得音频信号的距离信息。 本实施例将接收到的编码数据进行解码, 获得音频信号和音频信号的方 向信息和音频信号的距离信息, 根据音频信号的方向信息和音频信号的距离 信息对音频信号进行处理, 得到扬声器信号, 再播放扬声器信息, 从而能够 在不增加麦克风阵列体积的情况下, 将接收到的编码数据进行解码, 准确获 得音频信号的位置信息, 包括方向信息和距离信息, 进一步实现音频信号播 放。
音频信号的播放方法实施例三
如图 16所示, 为本发明音频信号的播放方法实施例三流程示意图, 可选 的, 在图 14所示技术方案基石出上, 还可以包括:
步骤 51、 将接收到的编码数据进行解码, 获得辅助视频。
可选的, 在图 14所示技术方案基石出上, 步骤 42具体可以包括: 步骤 422、 根据音频信号的方向信息和辅助视频, 获取音频信号的深度
信息;
步骤 423、 根据深度信息和音频信号的方向信息, 获取音频信号在显示 会场中的坐标信息;
步骤 424、 根据音频信号的坐标信息和视点的位置信息, 生成视点所处 位置对应的音频信号的距离信息。
可选的, 步骤 422之前还可以包括如下步骤:
步骤 421、 根据麦克风阵列和摄像机组的位置信息将辅助视频和音频信 号的方向信息换算到同一坐标系下。
本实施例将接收到的编码数据进行解码, 获取音频信号、 音频信号的方 向信息和辅助视频, 根据音频信号的方向信息和辅助视频获取音频信号的距 离信息, 根据音频信号的方向信息和音频信号的距离信息对音频信号进行处 理, 得到扬声器信号, 再播放扬声器信息, 从而能够在不增加麦克风阵列体 积的情况下, 结合三维视频信号和三维音频信号准确获得音频信号的位置信 息, 包括方向信息和距离信息, 进一步实现音频信号播放。
进一步地, 考虑到在实际布置系统时, 麦克风阵列测量声源方向时所采 用坐标系并不一定和摄像系统的坐标系重合, 因此, 需要对这两个坐标系进 行变换, 使得计算均在同一坐标系下进行。
音频信号的播放方法实施例四
如图 17所示, 为本发明音频信号的播放方法实施例四流程示意图, 可选 的, 在图 14所示技术方案基石出上, 还可以包括:
步骤 52、 将接收到的编码数据进行解码, 获得辅助视频和主视频。 可选的, 在图 14所示技术方案基石出上, 步骤 42具体可以包括: 步骤 53、 根据音频信号的方向信息和辅助视频, 获取音频信号的深度信 息;
步骤 54、 根据深度信息和音频信号的方向信息, 获取音频信号在显示会 场中的坐标信息;
步骤 55、 根据音频信号的坐标信息和视点的位置信息, 生成视点所处位 置对应的音频信号的距离信息。
可选的, 步骤 53之前还可以包括如下步骤:
步骤 50、 根据麦克风阵列和摄像机组的位置信息将辅助视频和音频信号 的方向信息换算到同一坐标系下。
可选的, 在图 14所示技术方案基石出上, 还可以包括如下步骤:
步骤 56、 利用三维视频显示方法, 对主视频和辅助视频进行处理, 得到 显示视频信号;
步骤 57、 播放显示视频信号。
本实施例将接收到的编码数据进行解码, 获取音频信号、 音频信号的方 向信息以及辅助视频和主视频, 根据音频信号的方向信息和辅助视频获取音 频信号的距离信息, 根据音频信号的方向信息和音频信号的距离信息对音频 信号进行处理, 得到扬声器信号, 再播放扬声器信息在不增加麦克风阵列体 积的情况下, 结合三维视频信号和三维音频信号准确获得音频信号的位置信 息, 包括方向信息和距离信息, 进一步实现音频信号播放。
进一步地, 本实施例对主视频和辅助视频进行处理, 得到显示视频信号, 再播放显示视频信号, 从而实现对视频信号的播放, 达到视频信号和音频信 号的结合。
音频信号的播放装置实施例一
如图 18所示, 为本发明音频信号的播放装置实施例一结构示意图, 具体 可以包括:音频信号解码模块 316、接收端音频信号的距离信息获取模块 317、 扬声器信号获取模块 318和扬声器信号播放模块 319 , 接收端音频信号的距 离信息获取模块 317与音频信号解码模块 316连接,扬声器信号获取模块 318 分别与音频信号解码模块 316和接收端音频信号的距离信息获取模块 317连 接, 扬声器信号播放模块 319与扬声器信号获取模块 318连接。 其中, 音频 信号解码模块 316用于将接收到的编码数据进行解码, 获得音频信号和音频 信号的方向信息; 接收端音频信号的距离信息获取模块 317用于获取音频信 号的距离信息; 扬声器信号获取模块 318用于从音频信号解码模块 316接收 音频信号和音频信号的方向信息,从接收端音频信号的距离信息获取模块 31 7 接收音频信号的距离信息, 根据音频信号的方向信息以及音频信号的距离信
息, 利用音频信号重现方法对音频信号进行处理, 得到与各个扬声器对应的 扬声器信号; 扬声器信号播放模块 319用于使用扬声器阵列或者环绕立体声 系统播放扬声器信号。
如果扬声器信号播放模块 319 , 例如扬声器阵列提供了扬声器信号获取 模块 318的功能, 则不再需要扬声器信号获取模块 318。
本实施例中音频信号解码模块 316将接收到的编码数据进行解码, 获得 音频信号和音频信号的方向信息, 接收端音频信号的距离信息获取模块 317 获取音频信号的距离信息, 扬声器信号获取模块 318根据音频信号的方向信 息和音频信号的距离信息对音频信号进行处理, 得到扬声器信号, 扬声器信 号播放模块 319再播放扬声器信号, 从而能够在不增加麦克风阵列体积的情 况下, 结合三维视频信号和三维音频信号准确获得音频信号的位置信息, 包 括方向信息和距离信息, 进一步实现音频信号播放。
音频信号的播放装置实施例二
如图 19所示, 为本发明音频信号的播放装置实施例二结构示意图, 在图 18所示结构示意图基石出上, 接收端音频信号的距离信息获取模块 317具体可 以为音频信号的距离信息解码模块 320 , 用于将接收到的编码数据进行解码 获得音频信号的距离信息。
本实施例还可以包括接收端通信接口 321 , 用于接收通过网络发送过来 的编码数据, 将编码数据传送给音频信号解码模块 316。
本实施例中音频信号解码模块 316将接收到的编码数据进行解码, 获得 音频信号和音频信号的方向信息, 通过音频信号的距离信息解码模块 320将 接收到的编码数据进行解码获得音频信号的距离信息 , 扬声器信号获取模块 31 8 根据音频信号的方向信息和音频信号的距离信息对音频信号进行处理, 得到扬声器信号, 扬声器信号播放模块 319再播放扬声器信号, 从而能够在 不增加麦克风阵列体积的情况下, 将接收到的编码数据进行解码, 准确获得 音频信号的位置信息, 包括方向信息和距离信息, 进一步实现音频信号播放。
音频信号的播放装置实施例三
如图 20所示, 为本发明音频信号的播放装置实施例三结构示意图, 在图
18所示结构示意图基础上, 还可以包括第一视频信号解码模块 322 , 用于将 接收到的编码数据进行解码, 获得辅助视频。
在图 18 所示结构示意图基础上, 接收端音频信号的距离信息获取模块 317具体可以为音频信号的距离信息获取模块 31 , 与音频信号解码模块 316 和第一视频信号解码模块 322连接, 用于根据音频信号的方向信息和辅助视 频生成音频信号的距离信息。
音频信号的距离信息获取模块 31具体可以包括深度信息获取单元 311、 坐标信息获取单元 312和距离信息获取单元 31 3 , 坐标信息获取单元 312与 深度信息获取单元 311连接,距离信息获取单元 31 3与坐标信息获取单元 312 连接。 其中, 深度信息获取单元 31 1用于根据音频信号的方向信息和辅助视 频, 获取音频信号的深度信息; 坐标信息获取单元 312用于根据深度信息和 音频信号的方向信息, 获取音频信号在显示会场中的坐标信息; 距离信息获 取单元 31 3用于根据音频信号的坐标信息和视点的位置信息, 生成视点所处 位置对应的音频信号的距离信息。
音频信号的距离信息获取模块 31还可以包括: 坐标变换单元 314 , 与第 一视频信号解码模块 322和音频信号解码模块 316连接, 用于根据麦克风阵 列和摄像机组的位置信息将辅助视频和音频信号的方向信息换算到同一坐标 系下, 将坐标变换后的辅助视频和音频信号的方向信息发送给深度信息获取 单元 311 , 将坐标变换后的音频信号的方向信息发送给坐标信息获取单元 312。
在图 18 所示结构示意图基础上, 本实施例还可以包括接收端通信接口 321 , 用于接收通过网络发送过来的编码数据, 将编码数据发送给音频信号解 码模块 316和第一视频信号解码模块 322。
本实施例中音频信号解码模块 316将接收到的编码数据进行解码, 获得 音频信号和音频信号的方向信息,音频信号的距离信息获取模块 31根据音频 信号的方向信息和辅助视频生成音频信号的距离信息, 扬声器信号获取模块 318 根据音频信号的方向信息和音频信号的距离信息对音频信号进行处理, 得到扬声器信号, 扬声器信号播放模块 319再播放扬声器信号, 从而在不增
加麦克风阵列体积的情况下, 结合三维视频信号和三维音频信号准确获得音 频信号的位置信息, 包括方向信息和距离信息, 进一步实现音频信号播放。
音频信号的播放装置实施例四
如图 21所示, 为本发明音频信号的播放装置实施例四结构示意图, 在图 18所示结构示意图基础上, 还可以包括第二视频信号解码模块 323、 视频输 出信号处理模块 324和视频输出模块 325 , 视频输出信号处理模块 324与第 二视频信号解码模块 323连接, 视频输出模块 325与视频输出信号处理模块 324连接。 其中, 第二视频信号解码模块 323用于将接收到的编码数据进行 解码, 获得辅助视频和主视频; 视频输出信号处理模块 324用于利用三维视 频显示方法, 对主视频和辅助视频进行处理, 得到显示视频信号; 视频输出 模块 325用于播放显示视频信号。
在图 18 所示结构示意图基础上, 接收端音频信号的距离信息获取模块 317具体可以为音频信号的距离信息获取模块 31 , 与音频信号解码模块 31 6 和第二视频信号解码模块 32 3连接, 用于根据音频信号的方向信息和辅助视 频生成音频信号的距离信息。
音频信号的距离信息获取模块 31具体可以包括深度信息获取单元 31 1、 坐标信息获取单元 312和距离信息获取单元 31 3 , 坐标信息获取单元 312与 深度信息获取单元 31 1连接,距离信息获取单元 31 3与坐标信息获取单元 31 2 连接。 其中, 深度信息获取单元 31 1用于根据音频信号的方向信息和辅助视 频, 获取音频信号的深度信息; 坐标信息获取单元 312用于根据深度信息和 音频信号的方向信息, 获取音频信号在显示会场中的坐标信息; 距离信息获 取单元 31 3用于根据音频信号的坐标信息和视点的位置信息, 生成视点所处 位置对应的音频信号的距离信息。
音频信号的距离信息获取模块 31还可以包括: 坐标变换单元 314 , 与音 频信号解码模块 316和第二视频信号解码模块 323连接, 用于根据麦克风阵 列和摄像机组的位置信息将辅助视频和音频信号的方向信息换算到同一坐标 系下, 将坐标变换后的辅助视频和音频信号的方向信息发送给深度信息获取 单元 31 1 , 将坐标变换后的音频信号的方向信息发送给坐标信息获取单元
312。
在图 18 所示结构示意图基础上, 本实施例还可以包括接收端通信接口 321 , 用于接收通过网络发送过来的编码数据, 将编码数据发送给音频信号解 码模块 316和第二视频信号解码模块 323。
视频输出模块 325通常为一个立体显示器, 如果立体显示器提供了视频 输出信号处理模块 324的功能, 在此情况下将不再需要视频输出信号处理模 块 324。
本实施例中音频信号解码模块 316将接收到的编码数据进行解码, 获得 音频信号和音频信号的方向信息,音频信号的距离信息获取模块 31根据音频 信号的方向信息和辅助视频生成音频信号的距离信息, 扬声器信号获取模块 318 根据音频信号的方向信息和音频信号的距离信息对音频信号进行处理, 得到扬声器信号, 扬声器信号播放模块 319再播放扬声器信号, 从而能够在 不增加麦克风阵列体积的情况下, 结合三维视频信号和三维音频信号准确获 得音频信号的位置信息, 包括方向信息和距离信息, 进一步实现音频信号的 播放。
进一步地, 视频输出信号处理模块 324利用三维视频显示方法, 对主视 频和辅助视频进行处理, 得到显示视频信号, 视频输出模块 325播放显示视 频信号, 从而实现视频信号的播放, 达到视频信号和音频信号的结合。
音频信号的处理系统实施例
如图 22所示, 为本发明音频信号的处理系统实施例结构示意图, 音频信 号的处理系统 329具体可以包括音频信号的生成装置 327和音频信号的播放 装置 328。
其中, 音频信号的生成装置 327具体可以包括音频信号的距离信息获取 模块 31和音频信号编码模块 32 , 音频信号编码模块 32与音频信号的距离信 息获取模块 31连接。 其中, 音频信号的距离信息获取模块 31用于根据获取 的音频信号的方向信息和辅助视频, 生成视点所处位置对应的音频信号的距 离信息, 其中辅助视频为视差图或深度图; 音频信号编码模块 32 , 将音频信 号、 音频信号的方向信息以及音频信号的距离信息进行编码并发送。
音频信号的播放装置 328具体可以包括音频信号解码模块 316、 接收端 音频信号的距离信息获取模块 317、 扬声器信号获取模块 318和扬声器信号 播放模块 319 , 接收端音频信号的距离信息获取模块 317与音频信号解码模 块 316连接, 扬声器信号获取模块 318分别与音频信号解码模块 316和接收 端音频信号的距离信息获取模块 317连接, 扬声器信号播放模块 319与扬声 器信号获取模块 318连接。 其中, 音频信号解码模块 316用于将接收到的编 码数据进行解码, 获得音频信号和音频信号的方向信息; 接收端音频信号的 距离信息获取模块 317用于获取音频信号的距离信息; 扬声器信号获取模块 318 用于根据音频信号的方向信息以及音频信号的距离信息, 利用音频信号 重现方法对音频信号进行处理, 得到与各个扬声器对应的扬声器信号; 扬声 器信号播放模块 319用于使用扬声器阵列或者环绕立体声系统播放扬声器信 号。
本实施例还可以包括回声抵消模块 320 , 与音频信号的生成装置 327和 音频信号的播放装置 328连接, 用于消除回声。
本实施例能够在不增加麦克风阵列体积的情况下, 结合三维视频信号和 三维音频信号准确获得音频信号的位置信息, 包括方向信息和距离信息, 进 一步实现音频信号的发送和播放。
本领域普通技术人员可以理解: 实现上述方法实施例的全部或部分步骤 可以通过程序指令相关的硬件来完成, 前述的程序可以存储于一计算机可读 取存储介质中, 该程序在执行时, 执行包括上述方法实施例的步骤, 而前述 的存储介质包括: ROM, RAM, 磁碟或者光盘等各种可以存储程序代码的介 质。
最后应说明的是: 以上实施例仅用以说明本发明的技术方案而非限制, 尽管参照较佳实施例对本发明进行了详细说明, 本领域的普通技术人员应当 理解, 可以对本发明的技术方案进行修改或者等同替换, 而不脱离本发明技 术方案的精神和范围。
Claims
1、 一种音频信号的生成方法, 其特征在于, 包括:
根据获取的音频信号的方向信息和辅助视频, 生成视点所处位置对应的 音频信号的距离信息, 其中所述辅助视频为视差图或深度图;
将音频信号、 所述音频信号的方向信息以及所述音频信号的距离信息进 行编码并发送。
2、 根据权利要求 1所述的音频信号的生成方法, 其特征在于, 所述获 取的音频信号的方向信息和辅助视频包括:
麦克风阵列捕捉至少两路音频信号作为输入音频流;
利用麦克风阵列处理方法处理所述输入音频流, 获得增强后的音频信号 以及所述音频信号的方向信息;
摄像机组捕捉至少两路视频信号作为输入视频流;
根据所述输入视频流, 获得主视频和所述辅助视频。
3、 根据权利要求 2所述的音频信号的生成方法, 其特征在于, 所述麦 克风阵列捕捉至少两路音频信号作为输入音频流包括:
麦克风阵列捕捉至少两路音频信号作为第一输入音频流, 各路音频信号 为多个音源的声音组成的混合音频信号;
使用音频信号分离方法分离所述第一输入音频流中的各路音频信号, 分 别获取每个音源的声音对应的音频信号, 将每个音源的声音对应的音频信号 组成输入音频流。
4、 根据权利要求 1或 2所述的音频信号的生成方法, 其特征在于, 还 包括:
将所述辅助视频进行编码并发送。
5、 根据权利要求 1或 2所述的音频信号的生成方法, 其特征在于, 还 包括:
将主视频和所述辅助视频进行编码并发送。
6、 根据权利要求 1 所述的音频信号的生成方法, 其特征在于, 所述生 成视点所处位置对应的音频信号的距离信息具体包括:
根据所述音频信号的方向信息和所述辅助视频, 获取所述音频信号的深 度信息;
根据所述深度信息和音频信号的方向信息, 获取音频信号在显示会场中 的坐标信息;
根据所述音频信号的坐标信息和视点的位置信息, 生成视点所处位置对 应的音频信号的距离信息。
7、 根据权利要求 6所述的音频信号的生成方法, 其特征在于, 所述获 取所述音频信号的深度信息具体包括:
根据所述音频信号的方向信息获取所述音频信号在所述辅助视频中的坐 标, 判断所述辅助视频为深度图还是视差图;
如果所述辅助视频为深度图, 根据所述坐标直接从所述深度图中获取所 述音频信号对应的深度信息;
如果所述辅助视频为视差图, 根据所述坐标从所述视差图中获取所述音 频信号对应的视差, 根据所述视差计算得到所述音频信号对应的深度信息。
8、 根据权利要求 6所述的音频信号的生成方法, 其特征在于, 所述获 取所述音频信号的深度信息之前还包括:
根据麦克风阵列和摄像机组的位置信息将所述辅助视频和所述音频信号 的方向信息换算到同一坐标系下。
9、 一种音频信号的生成装置, 其特征在于, 包括:
音频信号的距离信息获取模块, 用于根据获取的音频信号的方向信息和 辅助视频, 生成视点所处位置对应的音频信号的距离信息, 其中所述辅助视 频为视差图或深度图;
音频信号编码模块, 将音频信号、 所述音频信号的方向信息以及所述音 频信号的距离信息进行编码并发送。
10、 根据权利要求 9所述的音频信号的生成装置,其特征在于,还包括: 麦克风阵列, 用于捕捉至少两路音频信号作为输入音频流;
音频输入信号处理模块, 用于利用麦克风阵列处理方法处理所述输入音 频流, 获得增强后的音频信号以及音频信号的方向信息, 将所述音频信号以
及所述音频信号的方向信息发送给所述音频信号编码模块;
视频采集模块, 用于捕捉至少两路视频信号作为输入视频流;
视频输入信号处理模块, 用于根据所述输入视频流, 获得主视频和所述 辅助视频。
11、 根据权利要求 10所述的音频信号的生成装置, 其特征在于, 所述 麦克风阵列具体包括:
麦克风阵列单元, 用于捕捉至少两路音频信号作为第一输入音频流, 各 路音频信号为多个音源的声音组成的混合音频信号;
音频信号分离单元, 用于使用音频信号分离方法分离所述第一输入音频 流中的各路音频信号, 分别获取每个音源的声音对应的音频信号, 将每个音 源的声音对应的音频信号组成输入音频流。
12、 根据权利要求 10所述的音频信号的生成装置, 其特征在于, 还包 括:
第一视频编码模块, 与所述视频输入信号处理模块连接, 用于将所述辅 助视频进行编码并发送。
1 3、 根据权利要求 10所述的音频信号的生成装置, 其特征在于, 还包 括:
第二视频编码模块, 与所述视频输入信号处理模块连接, 用于将所述主 视频和所述辅助视频进行编码并发送。
14、 根据权利要求 9所述的音频信号的生成装置, 其特征在于, 所述音 频信号的距离信息获取模块具体包括:
深度信息获取单元,用于根据所述音频信号的方向信息和所述辅助视频, 获取所述音频信号的深度信息;
坐标信息获取单元, 用于根据所述深度信息和音频信号的方向信息, 获 取音频信号在显示会场中的坐标信息;
距离信息获取单元, 用于根据所述音频信号的坐标信息和视点的位置信 息, 生成视点所处位置对应的音频信号的距离信息。
15、 根据权利要求 14所述的音频信号的生成装置, 其特征在于, 所述
音频信号的距离信息获取模块还包括:
坐标变换单元, 用于根据麦克风阵列和摄像机组的位置信息将所 述辅助视频和所述音频信号的方向信息换算到同一坐标系下, 将坐标 变换后的辅助视频和音频信号的方向信息发送给所述深度信息获取单 元, 将坐标变换后的音频信号的方向信息发送给所述坐标信息获取单 元。
16、 一种音频信号的播放方法, 其特征在于, 包括:
将接收到的编码数据进行解码, 获得音频信号和音频信号的方向信息; 获取音频信号的距离信息;
根据所述音频信号的方向信息以及所述音频信号的距离信息, 利用音频 信号重现方法对所述音频信号进行处理, 得到与各个扬声器对应的扬声器信 号;
使用扬声器阵列或者环绕立体声系统播放所述扬声器信号。
17、 根据权利要求 16所述的音频信号的播放方法, 其特征在于, 所述 获取音频信号的距离信息之前还包括:
将接收到的编码数据进行解码, 获得辅助视频。
18、 根据权利要求 16所述的音频信号的播放方法, 其特征在于, 所述 获取音频信号的距离信息之前还包括:
将接收到的编码数据进行解码, 获得辅助视频和主视频。
19、 根据权利要求 18所述的音频信号的播放方法, 其特征在于, 所述 获取音频信号的距离信息具体包括:
将接收到的编码数据进行解码获得音频信号的距离信息。
20、 根据权利要求 16-18 任一所述的音频信号的播放方法, 其特征在 于, 所述获取音频信号的距离信息具体包括:
根据所述音频信号的方向信息和所述辅助视频生成音频信号的距离信 息。
21、 根据权利要求 20所述的音频信号的播放方法, 其特征在于, 所述 生成音频信号的距离信息具体包括:
根据所述音频信号的方向信息和所述辅助视频, 获取所述音频信号的深 度信息;
根据所述深度信息和音频信号的方向信息, 获取音频信号在显示会场中 的坐标信息;
根据所述音频信号的坐标信息和视点的位置信息, 生成视点所处位置对 应的音频信号的距离信息。
22、 根据权利要求 21所述的音频信号的播放方法, 其特征在于, 所述 获取所述音频信号的深度信息之前还包括:
根据麦克风阵列和摄像机组的位置信息将所述辅助视频和所述音频信号 的方向信息换算到同一坐标系下。
23、 根据权利要求 18所述的音频信号的播放方法, 其特征在于, 还包 括:
利用三维视频显示方法, 对所述主视频和所述辅助视频进行处理, 得到 显示视频信号;
播放所述显示视频信号。
24、 一种音频信号的播放装置, 其特征在于, 包括:
音频信号解码模块, 用于将接收到的编码数据进行解码, 获得音频信号 和音频信号的方向信息;
接收端音频信号的距离信息获取模块, 用于获取音频信号的距离信息; 扬声器信号获取模块, 用于从所述音频信号解码模块接收所述音频信号 和所述音频信号的方向信息, 从所述接收端音频信号的距离信息获取模块接 收所述音频信号的距离信息, 根据所述音频信号的方向信息以及所述音频信 号的距离信息, 利用音频信号重现方法对所述音频信号进行处理, 得到与各 个扬声器对应的扬声器信号;
扬声器信号播放模块, 用于使用扬声器阵列或者环绕立体声系统播放所 述扬声器信号。
25、 根据权利要求 24所述的音频信号的播放装置, 其特征在于, 还包 括:
第一视频信号解码模块, 用于将接收到的编码数据进行解码, 获得辅助 视频。
26、 根据权利要求 24所述的音频信号的播放装置, 其特征在于, 还包 括:
第二视频信号解码模块, 用于将接收到的编码数据进行解码, 获得辅助 视频和主视频;
视频输出信号处理模块, 用于利用三维视频显示方法, 对所述主视频和 所述辅助视频进行处理, 得到显示视频信号;
视频输出模块, 用于播放所述显示视频信号。
27、 根据权利要求 24所述的音频信号的播放装置, 其特征在于, 所述 接收端音频信号的距离信息获取模块具体为音频信号的距离信息解码模块, 用于将接收到的编码数据进行解码获得音频信号的距离信息。
28、 根据权利要求 24所述的音频信号的播放装置, 其特征在于, 所述 接收端音频信号的距离信息获取模块具体为音频信号的距离信息获取模块, 用于根据所述音频信号的方向信息和所述辅助视频生成音频信号的距离信 息。
29、 根据权利要求 28所述的音频信号的播放装置, 其特征在于, 所述 音频信号的距离信息获取模块具体包括:
深度信息获取单元,用于根据所述音频信号的方向信息和所述辅助视频, 获取所述音频信号的深度信息;
坐标信息获取单元, 用于根据所述深度信息和音频信号的方向信息, 获 取音频信号在显示会场中的坐标信息;
距离信息获取单元, 用于根据所述音频信号的坐标信息和视点的位置信 息, 生成视点所处位置对应的音频信号的距离信息。
30、 根据权利要求 29所述的音频信号的播放装置, 其特征在于, 所述 音频信号的距离信息获取模块还包括:
坐标变换单元, 用于根据麦克风阵列和摄像机组的位置信息将所述辅助 视频和所述音频信号的方向信息换算到同一坐标系下, 将坐标变换后的辅助
视频和音频信号的方向信息发送给所述深度信息获取单元, 将坐标变换后的 音频信号的方向信息发送给所述坐标信息获取单元。
31、 一种音频信号的处理系统, 其特征在于, 包括音频信号的生成装置 和音频信号的播放装置;
其中, 音频信号的生成装置包括音频信号的距离信息获取模块, 用于根 据获取的音频信号的方向信息和辅助视频, 生成视点所处位置对应的音频信 号的距离信息, 其中所述辅助视频为视差图或深度图; 音频信号编码模块, 将音频信号、 所述音频信号的方向信息以及所述音频信号的距离信息进行编 码并发送;
音频信号的播放装置包括音频信号解码模块, 用于将接收到的编码数据 进行解码, 获得音频信号和音频信号的方向信息; 接收端音频信号的距离信 息获取模块, 用于获取音频信号的距离信息; 扬声器信号获取模块, 用于根 据所述音频信号的方向信息以及所述音频信号的距离信息, 利用音频信号重 现方法对所述音频信号进行处理, 得到与各个扬声器对应的扬声器信号; 扬 声器信号播放模块, 用于使用扬声器阵列或者环绕立体声系统播放所述扬声 器信号。
32、 根据权利要求 31所述的音频信号的处理系统, 其特征在于, 还包 括:
回声抵消模块, 与所述音频信号的生成装置和所述音频信号的播放装置 连接, 用于消除回声。
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP17187688.1A EP3319344B1 (en) | 2008-08-27 | 2009-08-21 | Method and apparatus for generating audio signal information |
EP09809218.2A EP2323425B1 (en) | 2008-08-27 | 2009-08-21 | Method and device for generating audio signals |
US13/035,400 US8705778B2 (en) | 2008-08-27 | 2011-02-25 | Method and apparatus for generating and playing audio signals, and system for processing audio signals |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2008101191405A CN101350931B (zh) | 2008-08-27 | 2008-08-27 | 音频信号的生成、播放方法及装置、处理系统 |
CN200810119140.5 | 2008-08-27 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/035,400 Continuation US8705778B2 (en) | 2008-08-27 | 2011-02-25 | Method and apparatus for generating and playing audio signals, and system for processing audio signals |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2010022633A1 true WO2010022633A1 (zh) | 2010-03-04 |
Family
ID=40269474
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2009/073406 WO2010022633A1 (zh) | 2008-08-27 | 2009-08-21 | 音频信号的生成、播放方法及装置、处理系统 |
Country Status (4)
Country | Link |
---|---|
US (1) | US8705778B2 (zh) |
EP (2) | EP2323425B1 (zh) |
CN (1) | CN101350931B (zh) |
WO (1) | WO2010022633A1 (zh) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2012037073A1 (en) | 2010-09-13 | 2012-03-22 | Warner Bros. Entertainment Inc. | Method and apparatus for generating 3d audio positioning using dynamically optimized audio 3d space perception cues |
EP2566194A1 (en) * | 2010-11-26 | 2013-03-06 | Huawei Device Co., Ltd. | Method and device for processing audio in video communication |
US8705778B2 (en) | 2008-08-27 | 2014-04-22 | Huawei Technologies Co., Ltd. | Method and apparatus for generating and playing audio signals, and system for processing audio signals |
US10026452B2 (en) | 2010-06-30 | 2018-07-17 | Warner Bros. Entertainment Inc. | Method and apparatus for generating 3D audio positioning using dynamically optimized audio 3D space perception cues |
US10326978B2 (en) | 2010-06-30 | 2019-06-18 | Warner Bros. Entertainment Inc. | Method and apparatus for generating virtual or augmented reality presentations with 3D audio positioning |
US10453492B2 (en) | 2010-06-30 | 2019-10-22 | Warner Bros. Entertainment Inc. | Method and apparatus for generating encoded content using dynamically optimized conversion for 3D movies |
Families Citing this family (38)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TW201116041A (en) * | 2009-06-29 | 2011-05-01 | Sony Corp | Three-dimensional image data transmission device, three-dimensional image data transmission method, three-dimensional image data reception device, three-dimensional image data reception method, image data transmission device, and image data reception |
US20110116642A1 (en) * | 2009-11-16 | 2011-05-19 | Harman International Industries, Incorporated | Audio System with Portable Audio Enhancement Device |
US8676581B2 (en) * | 2010-01-22 | 2014-03-18 | Microsoft Corporation | Speech recognition analysis via identification information |
KR101844511B1 (ko) * | 2010-03-19 | 2018-05-18 | 삼성전자주식회사 | 입체 음향 재생 방법 및 장치 |
JP5672741B2 (ja) | 2010-03-31 | 2015-02-18 | ソニー株式会社 | 信号処理装置および方法、並びにプログラム |
KR101717787B1 (ko) * | 2010-04-29 | 2017-03-17 | 엘지전자 주식회사 | 디스플레이장치 및 그의 음성신호 출력 방법 |
JP5573426B2 (ja) * | 2010-06-30 | 2014-08-20 | ソニー株式会社 | 音声処理装置、音声処理方法、およびプログラム |
CN102387269B (zh) * | 2010-08-27 | 2013-12-04 | 华为终端有限公司 | 一种单讲状态下回声抵消的方法、装置及系统 |
JP2012119738A (ja) * | 2010-11-29 | 2012-06-21 | Sony Corp | 情報処理装置、情報処理方法およびプログラム |
CN102595153A (zh) * | 2011-01-13 | 2012-07-18 | 承景科技股份有限公司 | 可动态地提供三维音效的显示系统及相关方法 |
CN102186049B (zh) * | 2011-04-22 | 2013-03-20 | 华为终端有限公司 | 会场终端音频信号处理方法及会场终端和视讯会议系统 |
CN102769764B (zh) * | 2011-05-03 | 2015-09-09 | 晨星软件研发(深圳)有限公司 | 应用于三维显示器的方法与相关装置 |
US9084068B2 (en) * | 2011-05-30 | 2015-07-14 | Sony Corporation | Sensor-based placement of sound in video recording |
KR101901908B1 (ko) * | 2011-07-29 | 2018-11-05 | 삼성전자주식회사 | 오디오 신호 처리 방법 및 그에 따른 오디오 신호 처리 장치 |
US9291697B2 (en) | 2012-04-13 | 2016-03-22 | Qualcomm Incorporated | Systems, methods, and apparatus for spatially directive filtering |
KR20150032253A (ko) * | 2012-07-09 | 2015-03-25 | 엘지전자 주식회사 | 인핸스드 3d 오디오/비디오 처리 장치 및 방법 |
CN103634561A (zh) * | 2012-08-21 | 2014-03-12 | 徐丙川 | 会议通信装置和系统 |
JP6216169B2 (ja) * | 2012-09-26 | 2017-10-18 | キヤノン株式会社 | 情報処理装置、情報処理方法 |
EP2904817A4 (en) * | 2012-10-01 | 2016-06-15 | Nokia Technologies Oy | APPARATUS AND METHOD FOR REPRODUCING RECORDED AUDIO DATA WITH CORRECT SPACE ORIENTATION |
US9271076B2 (en) * | 2012-11-08 | 2016-02-23 | Dsp Group Ltd. | Enhanced stereophonic audio recordings in handheld devices |
US9769588B2 (en) | 2012-11-20 | 2017-09-19 | Nokia Technologies Oy | Spatial audio enhancement apparatus |
CN103916723B (zh) * | 2013-01-08 | 2018-08-10 | 联想(北京)有限公司 | 一种声音采集方法以及一种电子设备 |
US9483228B2 (en) | 2013-08-26 | 2016-11-01 | Dolby Laboratories Licensing Corporation | Live engine |
US9402095B2 (en) * | 2013-11-19 | 2016-07-26 | Nokia Technologies Oy | Method and apparatus for calibrating an audio playback system |
CN104735582B (zh) * | 2013-12-20 | 2018-09-07 | 华为技术有限公司 | 一种声音信号处理方法、装置及设备 |
KR102605480B1 (ko) * | 2014-11-28 | 2023-11-24 | 소니그룹주식회사 | 송신 장치, 송신 방법, 수신 장치 및 수신 방법 |
US20170188140A1 (en) * | 2015-12-24 | 2017-06-29 | Intel Corporation | Controlling audio beam forming with video stream data |
US9756421B2 (en) * | 2016-01-22 | 2017-09-05 | Mediatek Inc. | Audio refocusing methods and electronic devices utilizing the same |
CN105761721A (zh) * | 2016-03-16 | 2016-07-13 | 广东佳禾声学科技有限公司 | 一种携带位置信息的语音编码方法 |
CN106774930A (zh) * | 2016-12-30 | 2017-05-31 | 中兴通讯股份有限公司 | 一种数据处理方法、装置及采集设备 |
US20180220252A1 (en) * | 2017-01-31 | 2018-08-02 | Microsoft Technology Licensing, Llc | Spectator audio and video repositioning |
US10251013B2 (en) | 2017-06-08 | 2019-04-02 | Microsoft Technology Licensing, Llc | Audio propagation in a virtual environment |
US10447394B2 (en) * | 2017-09-15 | 2019-10-15 | Qualcomm Incorporated | Connection with remote internet of things (IoT) device based on field of view of camera |
BR112020021608A2 (pt) | 2018-04-24 | 2021-01-26 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | aparelho e método para renderização de um sinal de áudio para uma reprodução para um usuário |
CN109660911A (zh) | 2018-11-27 | 2019-04-19 | Oppo广东移动通信有限公司 | 录音音效处理方法、装置、移动终端及存储介质 |
CN111508507B (zh) * | 2019-01-31 | 2023-03-03 | 华为技术有限公司 | 一种音频信号处理方法及装置 |
CN111131616B (zh) * | 2019-12-28 | 2022-05-17 | 科大讯飞股份有限公司 | 基于智能终端的音频共享方法及相关装置 |
CN111654806B (zh) * | 2020-05-29 | 2022-01-07 | Oppo广东移动通信有限公司 | 音频播放方法、装置、存储介质及电子设备 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH08289275A (ja) * | 1995-04-17 | 1996-11-01 | Canon Inc | Tv会議システム |
WO2003017680A1 (en) * | 2001-08-15 | 2003-02-27 | Koninklijke Philips Electronics N.V. | 3d video conferencing system |
CN1717955A (zh) * | 2002-12-02 | 2006-01-04 | 汤姆森许可贸易公司 | 用于描述音频信号的合成的方法 |
CN1922655A (zh) * | 2004-07-06 | 2007-02-28 | 松下电器产业株式会社 | 音频信号编码装置、音频信号解码装置、方法及程序 |
CN101350931A (zh) * | 2008-08-27 | 2009-01-21 | 深圳华为通信技术有限公司 | 音频信号的生成、播放方法及装置、处理系统 |
Family Cites Families (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5714997A (en) * | 1995-01-06 | 1998-02-03 | Anderson; David P. | Virtual reality television system |
US6731334B1 (en) * | 1995-07-31 | 2004-05-04 | Forgent Networks, Inc. | Automatic voice tracking camera system and method of operation |
US5778082A (en) * | 1996-06-14 | 1998-07-07 | Picturetel Corporation | Method and apparatus for localization of an acoustic source |
US6829018B2 (en) * | 2001-09-17 | 2004-12-07 | Koninklijke Philips Electronics N.V. | Three-dimensional sound creation assisted by visual information |
US6813360B2 (en) * | 2002-01-22 | 2004-11-02 | Avaya, Inc. | Audio conferencing with three-dimensional audio encoding |
JP2007158527A (ja) * | 2005-12-01 | 2007-06-21 | Sony Corp | 信号処理装置、信号処理方法、再生装置、記録装置 |
JP4929740B2 (ja) | 2006-01-31 | 2012-05-09 | ヤマハ株式会社 | 音声会議装置 |
JP2008151766A (ja) | 2006-11-22 | 2008-07-03 | Matsushita Electric Ind Co Ltd | 立体音響制御装置及び立体音響制御方法 |
CN101668317B (zh) | 2008-09-04 | 2012-07-11 | 华为技术有限公司 | 一种网络资源预留的方法、系统和装置 |
-
2008
- 2008-08-27 CN CN2008101191405A patent/CN101350931B/zh active Active
-
2009
- 2009-08-21 EP EP09809218.2A patent/EP2323425B1/en active Active
- 2009-08-21 EP EP17187688.1A patent/EP3319344B1/en active Active
- 2009-08-21 WO PCT/CN2009/073406 patent/WO2010022633A1/zh active Application Filing
-
2011
- 2011-02-25 US US13/035,400 patent/US8705778B2/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH08289275A (ja) * | 1995-04-17 | 1996-11-01 | Canon Inc | Tv会議システム |
WO2003017680A1 (en) * | 2001-08-15 | 2003-02-27 | Koninklijke Philips Electronics N.V. | 3d video conferencing system |
CN1717955A (zh) * | 2002-12-02 | 2006-01-04 | 汤姆森许可贸易公司 | 用于描述音频信号的合成的方法 |
CN1922655A (zh) * | 2004-07-06 | 2007-02-28 | 松下电器产业株式会社 | 音频信号编码装置、音频信号解码装置、方法及程序 |
CN101350931A (zh) * | 2008-08-27 | 2009-01-21 | 深圳华为通信技术有限公司 | 音频信号的生成、播放方法及装置、处理系统 |
Non-Patent Citations (1)
Title |
---|
See also references of EP2323425A4 * |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8705778B2 (en) | 2008-08-27 | 2014-04-22 | Huawei Technologies Co., Ltd. | Method and apparatus for generating and playing audio signals, and system for processing audio signals |
US10026452B2 (en) | 2010-06-30 | 2018-07-17 | Warner Bros. Entertainment Inc. | Method and apparatus for generating 3D audio positioning using dynamically optimized audio 3D space perception cues |
US10326978B2 (en) | 2010-06-30 | 2019-06-18 | Warner Bros. Entertainment Inc. | Method and apparatus for generating virtual or augmented reality presentations with 3D audio positioning |
US10453492B2 (en) | 2010-06-30 | 2019-10-22 | Warner Bros. Entertainment Inc. | Method and apparatus for generating encoded content using dynamically optimized conversion for 3D movies |
US10819969B2 (en) | 2010-06-30 | 2020-10-27 | Warner Bros. Entertainment Inc. | Method and apparatus for generating media presentation content with environmentally modified audio components |
WO2012037073A1 (en) | 2010-09-13 | 2012-03-22 | Warner Bros. Entertainment Inc. | Method and apparatus for generating 3d audio positioning using dynamically optimized audio 3d space perception cues |
EP2719196A1 (en) * | 2010-09-13 | 2014-04-16 | Warner Bros. Entertainment Inc. | Method and apparatus for generating 3d audio positioning using dynamically optimized audio 3d space perception cues |
EP2719196A4 (en) * | 2010-09-13 | 2016-09-14 | Warner Bros Entertainment Inc | METHOD AND APPARATUS FOR GENERATING THREE-DIMENSIONAL AUDIO POSITIONING USING DYNAMICALLY OPTIMIZED AUDIO THREE-DIMENSIONAL SPACE PERCEPTION PATTERNS |
EP3379533A3 (en) * | 2010-09-13 | 2019-03-06 | Warner Bros. Entertainment Inc. | Method and apparatus for generating 3d audio positioning using dynamically optimized audio 3d space perception cues |
EP2566194A1 (en) * | 2010-11-26 | 2013-03-06 | Huawei Device Co., Ltd. | Method and device for processing audio in video communication |
EP2566194A4 (en) * | 2010-11-26 | 2013-08-21 | Huawei Device Co Ltd | METHOD AND DEVICE FOR AUDIO PROCESSING IN VIDEO COMMUNICATION |
US9113034B2 (en) | 2010-11-26 | 2015-08-18 | Huawei Device Co., Ltd. | Method and apparatus for processing audio in video communication |
Also Published As
Publication number | Publication date |
---|---|
EP2323425B1 (en) | 2017-11-15 |
EP3319344B1 (en) | 2022-10-26 |
EP3319344A1 (en) | 2018-05-09 |
US20110164769A1 (en) | 2011-07-07 |
EP2323425A4 (en) | 2012-09-12 |
EP2323425A1 (en) | 2011-05-18 |
CN101350931B (zh) | 2011-09-14 |
CN101350931A (zh) | 2009-01-21 |
US8705778B2 (en) | 2014-04-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2010022633A1 (zh) | 音频信号的生成、播放方法及装置、处理系统 | |
US11082662B2 (en) | Enhanced audiovisual multiuser communication | |
US8115799B2 (en) | Method and apparatus for obtaining acoustic source location information and a multimedia communication system | |
US8970704B2 (en) | Network synchronized camera settings | |
US9113034B2 (en) | Method and apparatus for processing audio in video communication | |
EP2352290B1 (en) | Method and apparatus for matching audio and video signals during a videoconference | |
WO2018082284A1 (zh) | 3d全景音视频直播系统及音视频采集方法 | |
CN109413359B (zh) | 摄像跟踪方法、装置及设备 | |
US20100103244A1 (en) | device for and method of processing image data representative of an object | |
WO2010022658A1 (zh) | 多视点媒体内容的发送和播放方法、装置及系统 | |
WO2013178188A1 (zh) | 视频会议显示方法及装置 | |
WO2009076853A1 (zh) | 立体视频通信终端、系统及方法 | |
WO2015070558A1 (zh) | 一种控制视频拍摄的方法和装置 | |
CN109257559A (zh) | 一种全景视频会议的图像显示方法、装置及视频会议系统 | |
WO2012072008A1 (zh) | 视频信号的辅助信息叠加方法及装置 | |
WO2011153907A1 (zh) | 一种播放远端与会人员音频的方法、装置及远程视频会议系统 | |
US11516433B1 (en) | Representation and compression of gallery view for video conferencing | |
JP3488096B2 (ja) | 3次元共有仮想空間通信サービスにおける顔画像制御方法,3次元共有仮想空間通信用装置およびそのプログラム記録媒体 | |
CN109756683B (zh) | 全景音视频录制方法、装置、存储介质和计算机设备 | |
JP2006128816A (ja) | 立体映像・立体音響対応記録プログラム、再生プログラム、記録装置、再生装置及び記録メディア | |
WO2015090039A1 (zh) | 一种声音处理方法、装置及设备 | |
US11076224B2 (en) | Processing of data of a video sequence in order to zoom to a speaker detected in the sequence | |
JP2006339869A (ja) | 映像信号と音響信号の統合装置 | |
CN207589058U (zh) | 一种全景视频的分割投影系统 | |
TW202231061A (zh) | 視訊會議系統及其方法、感測裝置及介面產生方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 09809218 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
REEP | Request for entry into the european phase |
Ref document number: 2009809218 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2009809218 Country of ref document: EP |