WO2015086894A1 - Appareil de capture de scène audio - Google Patents

Appareil de capture de scène audio Download PDF

Info

Publication number
WO2015086894A1
WO2015086894A1 PCT/FI2014/050951 FI2014050951W WO2015086894A1 WO 2015086894 A1 WO2015086894 A1 WO 2015086894A1 FI 2014050951 W FI2014050951 W FI 2014050951W WO 2015086894 A1 WO2015086894 A1 WO 2015086894A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio
sound level
audio signal
value
positional measure
Prior art date
Application number
PCT/FI2014/050951
Other languages
English (en)
Inventor
Juha OJANPERÄ
Original Assignee
Nokia Technologies Oy
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nokia Technologies Oy filed Critical Nokia Technologies Oy
Publication of WO2015086894A1 publication Critical patent/WO2015086894A1/fr

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S5/00Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations
    • G01S5/18Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations using ultrasonic, sonic, or infrasonic waves
    • G01S5/30Determining absolute distances from a plurality of spaced points of known location
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2400/00Details of stereophonic systems covered by H04S but not provided for in its groups
    • H04S2400/15Aspects of sound capture and related signal processing for recording or reproduction

Definitions

  • the present application relates to apparatus for the processing of audio and additionally video signals.
  • the invention further relates to, but is not limited to, apparatus for processing audio and additionally video signals from mobile devices.
  • Multiple 'feeds' may be found in sharing services for video and audio signals (such as those employed by YouTube).
  • Such systems which are known and are widely used to share user generated content recorded and uploaded or up-streamed to a server and then downloaded or down-streamed to a viewing/listening user.
  • Such systems rely on users recording and uploading or up-streaming a recording of an event using the recording facilities at hand to the user. This may typically be in the form of the camera and microphone arrangement of a mobile device such as a mobile phone.
  • the viewing/listening end user may then select one of the up-streamed or uploaded data to view or listen.
  • the viewing/listening end user may then select one of the up-streamed or uploaded data to view or listen.
  • it can be possible to generate a "three dimensional" rendering of the event by combining various different recordings from different users or improve upon user generated content from a single source, for example reducing background noise by mixing different users content to attempt to overcome local interference, or uploading errors.
  • There can be a problem in multiple recording systems where the recording devices are in close proximity and the same audio scene is recorded multiple times.
  • a method comprising receiving a value indicative of a sound level associated with an audio signal received by an audio apparatus; receiving at least one further value indicative of a sound level associated with at least one further audio signal received by at least one further audio apparatus; determining a positional measure for the audio apparatus, wherein the positional measure is relative to the position of the source of the audio signal, and wherein the positional measure is dependent on the value indicative of the sound level associated with the audio signal and the at least one further value indicative of the sound level associated with the at least one further audio signal; and instructing the audio apparatus to capture the audio signal dependent on whether the positional measure is greater than a threshold value.
  • the method may further comprise determining at least one further positional measure for the at least one further audio apparatus, wherein the at least one further positional measure is relative to the position of the source of the at least one further audio signal.
  • the threshold value may be determined by: determining a maximum positional measure value from the positional measure for the audio apparatus and the at least one further positional measure for the at least one further audio apparatus; and determining the difference between the maximum positional measure value and the standard deviation of the positional measure and the at least one further positional measure.
  • Determining the positional measure for the audio apparatus may comprise: determining for each of a plurality of instances in time a maximum sound level value by selecting a maximum value from the value indicative of the sound level associated with the audio signal and the at least one further value indicative of the sound level associated with the at least one further audio signal for each of the plurality of instances in time; determining a relative average sound level for each of the plurality of instances in time by normalising the value indicative of the sound level with the maximum sound level value for each of the plurality of instances in time; and summing the plurality of instances in time of the relative average sound level.
  • the value indicative of the sound level may be an average sound level at an instance in time.
  • the average sound level for the instance of time may comprise a plurality of absolute value samples of the received audio signal averaged over a period of time spanning the instance of time.
  • the audio signal and the at least one further audio signal may be audio signals of an audio scene.
  • the method may further comprise clustering an identifier associated with the audio apparatus and an identifier associated with the at least one further audio apparatus into a subgroup of identifiers of audio apparatuses within a common geographic location within the audio scene.
  • an apparatus configured to: receive a value indicative of a sound level associated with an audio signal received by an audio apparatus; receive at least one further value indicative of a sound level associated with at least one further audio signal received by at least one further audio apparatus; determine a positional measure for the audio apparatus, wherein the positional measure is relative to the position of the source of the audio signal, and wherein the positional measure is dependent on the value indicative of the sound level associated with the audio signal and the at least one further value indicative of the sound level associated with the at least one further audio signal; and instruct the audio apparatus to capture the audio signal dependent on whether the positional measure is greater than a threshold value.
  • the apparatus may be further configured to determine at least one further positional measure for the at least one further audio apparatus, wherein the at least one further positional measure is relative to the position of the source of the at least one further audio signal.
  • the threshold value may be determined by the apparatus being configured to: determine a maximum positional measure value from the positional measure for the audio apparatus and the at least one further positional measure for the at least one further audio apparatus; and determine the difference between the maximum positional measure value and the standard deviation of the positional measure and the at least one further positional measure.
  • the apparatus configured to determine the positional measure for the audio apparatus may be further configured to: determine for each of a plurality of instances in time a maximum sound level value by selecting a maximum value from the value indicative of the sound level associated with the audio signal and the at least one further value indicative of the sound level associated with the at least one further audio signal for each of the plurality of instances in time; determine a relative average sound level for each of the plurality of instances in time by normalising the value indicative of the sound level with the maximum sound level value for each of the plurality of instances in time; and sum the plurality of instances in time of the relative average sound level.
  • the value indicative of the sound level may be an average sound level at an instance in time.
  • the average sound level for the instance of time may comprise a plurality of absolute value samples of the received audio signal averaged over a period of time spanning the instance of time.
  • the audio signal and the at least one further audio signal may be audio signals of an audio scene.
  • the apparatus may be further configured to cluster an identifier associated with the audio apparatus and an identifier associated with the at least one further audio apparatus into a subgroup of identifiers of audio apparatuses within a common geographic location within the audio scene.
  • an apparatus comprising at least one processor and at least one memory including computer code, the at least one memory and the computer code configured to with the at least one processor cause the apparatus to: receive a value indicative of a sound level associated with an audio signal received by an audio apparatus; receive at least one further value indicative of a sound level associated with at least one further audio signal received by at least one further audio apparatus; determine a positional measure for the audio apparatus, wherein the positional measure is relative to the position of the source of the audio signal, and wherein the positional measure is dependent on the value indicative of the sound level associated with the audio signal and the at least one further value indicative of the sound level associated with the at least one further audio signal; and instruct the audio apparatus to capture the audio signal dependent on whether the positional measure is greater than a threshold value.
  • the apparatus may be further caused to determine at least one further positional measure for the at least one further audio apparatus, wherein the at least one further positional measure is relative to the position of the source of the at least one further audio signal.
  • the threshold value may be determined by the apparatus being caused to: determine a maximum positional measure value from the positional measure for the audio apparatus and the at least one further positional measure for the at least one further audio apparatus; and determine the difference between the maximum positional measure value and the standard deviation of the positional measure and the at least one further positional measure.
  • the apparatus may be configured to determine the positional measure for the audio apparatus may be further caused to: determine for each of a plurality of instances in time a maximum sound level value by selecting a maximum value from the value indicative of the sound level associated with the audio signal and the at least one further value indicative of the sound level associated with the at least one further audio signal for each of the plurality of instances in time; determine a relative average sound level for each of the plurality of instances in time by normalising the value indicative of the sound level with the maximum sound level value for each of the plurality of instances in time; and sum the plurality of instances in time of the relative average sound level.
  • the value indicative of the sound level may be an average sound level at an instance in time.
  • the average sound level for the instance of time may comprise a plurality of absolute value samples of the received audio signal averaged over a period of time spanning the instance of time.
  • the audio signal and the at least one further audio signal may be audio signals of an audio scene.
  • the apparatus may be further caused to cluster an identifier associated with the audio apparatus and an identifier associated with the at least one further audio apparatus into a subgroup of identifiers of audio apparatuses within a common geographic location within the audio scene.
  • a computer program code realizing the following when executed by a processor: receiving a value indicative of a sound level associated with an audio signal received by an audio apparatus; receiving at least one further value indicative of a sound level associated with at least one further audio signal received by at least one further audio apparatus; determining a positional measure for the audio apparatus, wherein the positional measure is relative to the position of the source of the audio signal, and wherein the positional measure is dependent on the value indicative of the sound level associated with the audio signal and the at least one further value indicative of the sound level associated with the at least one further audio signal; and instructing the audio apparatus to capture the audio signal dependent on whether the positional measure is greater than a threshold value.
  • An electronic device may comprise apparatus as described above.
  • a chipset may comprise apparatus as described above.
  • Figure 1 shows schematically a multi-user free-viewpoint service sharing system which may encompass embodiments of the application
  • FIG. 2 shows schematically an apparatus suitable for being employed in embodiments of the application
  • Figure 3 shows schematically an audio scene capture apparatus according to some embodiments
  • Figure 4 shows schematically a device selection analysis processor in relation to a number of audio scene capture apparatuses according to some embodiments.
  • Figure 5 shows schematically a method of operation of the device selection analysis processor shown in Figure 4 according to some embodiments.
  • audio signals and audio capture uploading and downloading is described. However it would be appreciated that in some embodiments the audio signal/audio capture, uploading and downloading is one part of an audio-video system.
  • the audio space 1 can have located within it at least one recording or capturing devices or apparatus 10 which are arbitrarily positioned within the audio space to record suitable audio scenes.
  • the apparatus shown in Figure 1 are represented as microphones with a polar gain pattern 801 showing the directional audio capture gain associated with each apparatus.
  • the apparatus 19 in Figure 1 are shown such that some of the apparatus are capable of attempting to capture the audio scene or activity 803 within the audio space.
  • the activity 803 can be any event the user of the apparatus wishes to capture. For example the event could be a music event or audio of a news worthy event.
  • the apparatus 19 although being shown having a directional microphone gain pattern 801 would be appreciated that in some embodiments the microphone or microphone array of the recording apparatus 19 has a omnidirectional gain or different gain profile to that shown in Figure 1 .
  • Each recording apparatus 19 can in some embodiments transmit or alternatively store for later consumption the captured audio signals via a transmission channel 807 to an audio scene server 809.
  • the recording apparatus 19 in some embodiments can encode the audio signal to compress the audio signal in a known way in order to reduce the bandwidth required in "uploading" the audio signal to the audio scene server 809.
  • the recording apparatus 19 in some embodiments can be configured to estimate and upload via the transmission channel 807 to the audio scene server 809 an estimation of the location and/or the orientation or direction of the apparatus.
  • the position information can be obtained, for example, using GPS coordinates, cell-ID or a-GPS or any other suitable location estimation methods and the orientation/direction can be obtained, for example using a digital compass, accelerometer, or gyroscope information.
  • the recording device or apparatus 19 can be configured to capture or record one or more audio signals for example the apparatus in some embodiments have multiple microphones each configured to capture the audio signal from different directions. In such embodiments the recording device or apparatus 19 can record and provide more than one signal from different the direction/orientations and further supply position/direction information for each signal.
  • the recording device or apparatus 19 may be further configured to perform a function of audio signal level monitoring in which the sound level of the audio scene may be monitored via the captured audio signal.
  • step 1001 The capturing and monitoring of the audio signal level is shown in Figure 1 by step 1001 .
  • step 1 in Figure 1 may also comprise estimating the position/direction of the apparatus.
  • the uploading of the audio, audio signal level and position/direction estimate to the audio scene server is shown in Figure 1 by step 1003.
  • the audio scene server 809 furthermore can in some embodiments communicate via a further transmission channel 81 1 to a listening device 813.
  • the listening device 813 which is represented in Figure 1 by a set of headphones, can prior to or during downloading via the further transmission channel 81 1 select a listening point, in other words select a position such as indicated in Figure 1 by the selected listening point 805.
  • the listening device 813 can communicate via the further transmission channel 81 1 to the audio scene server 809 the request.
  • the selection of the listening point 805, may be dependent on the audio signal level as monitored by the recording apparatus 19 and delivered to the listening device 813.
  • the selection of the listening point 805 may be dependent on the audio signal level as monitored by the recording apparatus 19 and made by the audio scene server 809.
  • the selection of the listening point 805 may comprise instructing one or more recording devices 19 in the vicinity of the selected point 805 to capture the audio signal.
  • the one or more recording devices 19 may be selected according to the audio signal level as monitored by each of the one or more recording devices 19.
  • the selection of a listening position by the listening device 813 is shown in Figure 1 by step 1005.
  • the audio scene server 809 can as discussed above in some embodiments receive from each of the recording apparatus 19 an approximation or estimation of the location and/or direction of the recording apparatus 19.
  • the audio scene server 809 can in some embodiments from the various captured audio signals from recording apparatus 19 produce a composite audio signal representing the desired listening position and the composite audio signal can be passed via the further transmission channel 81 1 to the listening device 813.
  • the audio scene server 809 can be configured to select captured audio signals from the apparatus "closest" to the desired or selected listening point, and to transmit these to the listening device 813 via the further transmission channel 81 1 .
  • the generation or supply of a suitable audio signal based on the selected listening position indicator is shown in Figure 1 by step 1007.
  • the listening device 813 can request a multiple channel audio signal or a mono-channel audio signal. This request can in some embodiments be received by the audio scene server 809 which can generate the requested multiple channel data.
  • the audio scene server 809 in some embodiments can receive each uploaded audio signal and can keep track of the positions and the associated direction/orientation associated with each audio signal. In some embodiments the audio scene server 809 can provide a high level coordinate system which corresponds to locations where the uploaded/upstreamed content source is available to the listening device 813.
  • the audio scene server 807 may be used responsible for determining and selecting the listening position or capturing device.
  • the listening device/end user can be configured to select or determine other aspects of the desired audio signal, for example signal quality, number of channels of audio desired, etc.
  • the audio scene server 807 can provide in some embodiments a selected set of downmixed signals which correspond to listening points neighbouring the desired location/direction and the listening device 813 selects the audio signal desired.
  • Figure 2 shows a schematic block diagram of an exemplary apparatus or electronic device 10, which may be used to record (or operate as a recording device 19) or listen (or operate as a listening device 813) to the audio signals (and similarly to record or view the audio-visual images and data).
  • the electronic device 10 may for example be a mobile terminal or user equipment of a wireless communication system.
  • the apparatus can be an audio player or audio recorder, such as an MP3 player, a media recorder/player (also known as an MP4 player), or any suitable portable device suitable for recording audio or audio/video camcorder/memory audio or video recorder.
  • an audio player or audio recorder such as an MP3 player, a media recorder/player (also known as an MP4 player), or any suitable portable device suitable for recording audio or audio/video camcorder/memory audio or video recorder.
  • the apparatus can in some embodiments comprise an audio subsystem.
  • the audio subsystem for example can comprise in some embodiments a microphone or array of microphones 1 1 for audio signal capture.
  • the microphone or array of microphones can be a solid state microphone, in other words capable of capturing audio signals and outputting a suitable digital format signal.
  • the microphone or array of microphones 1 1 can comprise any suitable microphone or audio capture means, for example a condenser microphone, capacitor microphone, electrostatic microphone, Electret condenser microphone, dynamic microphone, ribbon microphone, carbon microphone, piezoelectric microphone, or microelectrical-mechanical system (MEMS) microphone.
  • the microphone 1 1 or array of microphones can in some embodiments output the audio captured signal to an analogue-to-digital converter (ADC) 14.
  • ADC analogue-to-digital converter
  • the apparatus can further comprise an analogue-to-digital converter (ADC) 14 configured to receive the analogue captured audio signal from the microphones and outputting the audio captured signal in a suitable digital form.
  • ADC analogue-to-digital converter
  • the analogue-to-digital converter 14 can be any suitable analogue-to-digital conversion or processing means.
  • the apparatus 10 audio subsystem further comprises a digital-to-analogue converter 32 for converting digital audio signals from a processor 21 to a suitable analogue format.
  • the digital-to-analogue converter (DAC) or signal processing means 32 can in some embodiments be any suitable DAC technology.
  • the audio subsystem can comprise in some embodiments a speaker 33.
  • the speaker 33 can in some embodiments receive the output from the digital- to-analogue converter 32 and present the analogue audio signal to the user.
  • the speaker 33 can be representative of a headset, for example a set of headphones, or cordless headphones.
  • the apparatus 10 can comprise one or the other of the audio capture and audio presentation parts of the audio subsystem such that in some embodiments of the apparatus the microphone (for audio capture) or the speaker (for audio presentation) are present.
  • the apparatus 10 comprises a processor 21 .
  • the processor 21 is coupled to the audio subsystem and specifically in some examples the analogue-to-digital converter 14 for receiving digital signals representing audio signals from the microphone 1 1 , and the digital-to-analogue converter (DAC) 12 configured to output processed digital audio signals.
  • the processor 21 can be configured to execute various program codes.
  • the implemented program codes can comprise for example audio encoding code routines.
  • the apparatus further comprises a memory 22.
  • the processor is coupled to memory 22.
  • the memory can be any suitable storage means.
  • the memory 22 comprises a program code section 23 for storing program codes implementable upon the processor 21 .
  • the memory 22 can further comprise a stored data section 24 for storing data, for example data that has been encoded in accordance with the application or data to be encoded via the application embodiments as described later.
  • the implemented program code stored within the program code section 23, and the data stored within the stored data section 24 can be retrieved by the processor 21 whenever needed via the memory-processor coupling.
  • the apparatus 10 can comprise a user interface 15.
  • the user interface 15 can be coupled in some embodiments to the processor 21 .
  • the processor can control the operation of the user interface and receive inputs from the user interface 15.
  • the user interface 15 can enable a user to input commands to the electronic device or apparatus 10, for example via a keypad, and/or to obtain information from the apparatus 10, for example via a display which is part of the user interface 15.
  • the user interface 15 can in some embodiments comprise a touch screen or touch interface capable of both enabling information to be entered to the apparatus 10 and further displaying information to the user of the apparatus 10.
  • the apparatus further comprises a transceiver 13, the transceiver in such embodiments can be coupled to the processor and configured to enable a communication with other apparatus or electronic devices, for example via a wireless communications network.
  • the transceiver 13 or any suitable transceiver or transmitter and/or receiver means can in some embodiments be configured to communicate with other electronic devices or apparatus via a wire or wired coupling.
  • the coupling can, as shown in Figure 1 , be the transmission channel 807 (where the apparatus is functioning as the recording device 19) or further transmission channel 81 1 (where the device is functioning as the listening device 813).
  • the transceiver 13 can communicate with further devices by any suitable known communications protocol, for example in some embodiments the transceiver 13 or transceiver means can use a suitable universal mobile telecommunications system (UMTS) protocol, a wireless local area network (WLAN) protocol such as for example IEEE 802.X, a suitable short-range radio frequency communication protocol such as Bluetooth, or infrared data communication pathway (IRDA).
  • UMTS universal mobile telecommunications system
  • WLAN wireless local area network
  • IRDA infrared data communication pathway
  • the apparatus comprises a position sensor 16 configured to estimate the position of the apparatus 10.
  • the position sensor 16 can in some embodiments be a satellite positioning sensor such as a GPS (Global Positioning System), GLONASS or Galileo receiver.
  • the positioning sensor can be a cellular ID system or an assisted GPS system.
  • the apparatus 10 further comprises a direction or orientation sensor.
  • the orientation/direction sensor can in some embodiments be an electronic compass, accelerometer, a gyroscope or be determined by the motion of the apparatus using the positioning estimate. It is to be understood again that the structure of the electronic device 10 could be supplemented and varied in many ways.
  • the above apparatus 10 in some embodiments can be operated as an audio scene server 809.
  • the audio scene server 809 can comprise a processor, memory and transceiver combination.
  • the apparatus 10 is shown in further detail with respect to an audio scene capture apparatus 100.
  • the audio scene capture apparatus 100 can comprise in some embodiments an audio scene controller 101 or suitable controlling means.
  • the audio scene controller 101 is configured to control the operation of the audio scene capture operations.
  • the audio scene controller 101 may be configured to control the audio scene recorder/encoder 105.
  • the audio scene controller 101 may be configured to receive a recording request message via the transceiver circuitry 13 instructing the audio scene capture apparatus 100 to capture or record the audio events around the apparatus.
  • the audio scene capture apparatus 100 can also comprise an audio signal level monitor 107 which may be arranged to monitor the sound (or audio signal) levels of the audio scene around the audio scene capture apparatus 100. As depicted in Figure 3 the audio signal associated with the audio scene may be received by the audio signal level monitor 107 via the microphone. In some embodiments the audio signal level monitor 107 may continuously monitor the audio signal by converting the received audio signal into a value which indicates the average sound level at a particular instance in time.
  • the received audio signal which may be associated with the audio scene around the audio scene capture apparatus 100 may be converted into the value indicative of the average sound level at a particular time instance t by using the following expression
  • Xl(t) denotes the value indicative of the average sound level at a time instance t
  • x is the audio signal around the audio scene capture apparatus 100
  • te (t) and ts(t) denote functions which determine the number of time samples around the time instance t.
  • the functions te (t) and ts(t) determine the time window of samples around the time instance t about which the value indicative of the average sound level Xl(t) is determined.
  • This has the advantage of smoothing out any rapid fluctuations in the value of Xl(t).
  • the averaging process may have the effect of low pass filtering the continuously generated values of Xl(t).
  • the audio signal level monitor 107 may be arranged to continuously generate values of Xl(t) at a rate which is lower than the sampling rate of the audio signal x(i) .
  • a value of Xl(t) may be generated for every N samples of the audio signal x(i) .
  • the value of Xl(t) may be calculated at rate of every ten or so milliseconds of the audio signal x(i) . This has the advantage of using fewer processing instruction cycles in order to determine the value indicative of the average sound level Xl (t) and a further advantage of using a minimum transmission bandwidth to send said values for further processing.
  • the audio scene capture apparatus 100 may be arranged to send the values indicative of the average sound level for instances of time t to a further entity such as a device selection analysis processor 200.
  • this data may be sent to a further entity over a control channel or broadcast/data channel of a communication system, for example the transmission channel of 807 in Figure 1 .
  • the audio signal level monitor 107 may comprise the means for sending a value indicative of the sound level which may be an average sound level at an instance in time.
  • the average sound level for the instance of time may comprise a plurality of absolute value samples of the received audio signal averaged over a period of time spanning the instance in time.
  • the apparatus 10 is shown in further detail with respect to device selection analysis processor embodiments of the application.
  • a number of audio scene capture apparatuses 100 1 to 100_N in which each of the audio scene capture apparatuses 100 1 to 100_N is shown as being communicatively connected to the device selection analyses processor 200 via a communication channel 201 such as the transmission channels 807 or 81 1 .
  • the operation of the device selection analysis processor 200 is shown in Figure 5.
  • the device selection analysis processor 200 may reside in or form part of the functionality of the audio scene server 809.
  • the device selection analysis processor 200 may be configured to receive average sound level information from a plurality of audio scene capture apparatuses positioned around the audio scene. The device selection processor 200 may then use this information in order to determine and accordingly instruct which of the plurality of audio scene capture apparatuses should be instructed to record the audio scene at any given time.
  • the device selection processor 200 may be arranged to select a number of the plurality of the audio scene capture apparatuses to capture audio signals received from the audio scene. Furthermore, the device selection processor 200 may continually monitor and process the received sound level information from each of the plurality of audio scene capture apparatuses in order to continuously adapt said selection. In other words the pattern of selected audio scene capture apparatuses may adapt according to the changing audio scene.
  • each audio scene capture apparatus 100_1 to 100_N may be required to register with the device selection analysis processor 200, in order to impart its identity and location estimate data of each audio scene capture apparatus within the audio scene.
  • the device selection analysis processor 200 may then use this information to arrange the identities of the various audio scene capture apparatuses 100 1 to 100_N into a number of different subgroups.
  • the audio scene capture apparatuses 100 1 to 100_N may be arranged according to their relative geographical position within the audio scene.
  • the positional information can be obtained by each audio scene capture apparatus 100 1 to 100_N by using any suitable location estimation method implemented on the apparatus 10 such as GPS coordinates, Cell-ID or A-GPS.
  • This positional information from each of the audio scene capture apparatuses 100 1 to 100_N may be uploaded to the device selection processor 200 via the communication channel
  • the registering step of each of the audio scene apparatuses 100 1 to 100_N is depicted in Figure 5 as the processing step 501 in which the registration of each audio scene capture apparatus is received by the device selection analysis processor 200.
  • the device selection analysis processor 200 may be arranged to cluster the identities of the audio scene capture apparatuses 100 1 to 100_N according to the relative position of each audio scene capture apparatus within the audio scene, so that the identities of the audio scene capture apparatuses 100_1 to 100_N which lie within the same particular geographical region of the audio scene may be grouped into the same subgroup.
  • the device selection processor 200 may partition the audio scene according to the geographic location of each audio scene capture apparatus by dividing the identities associated with the audio scene apparatuses into a number of sub groups, where each sub group may comprise the identities of the audio scene apparatuses 100 1 to 100_N which are within the same geographical region of the audio scene.
  • the step of clustering the identity of the plurality audio scene capture apparatuses 100 1 to 100_N into subgroups of identities, where each subgroup comprises the identities of all audio scene capture apparatuses 100 1 to 100_N within the same geographic region of the audio scene is shown as processing step 503 in Figure 5.
  • the device selection analysis processor 200 may also be configured to receive the values indicative of the average sound level from the plurality of audio scene capture apparatuses 100_1 to 100_N positioned around the audio scene.
  • the step of the device selection analysis processor 200 receiving values indicative of the average sound level from the plurality of audio scene capture apparatuses 100 1 to 100_N is shown as processing step 505 in Figure 5.
  • the device selection analysis processor 200 may then analyse the values indicative of the average sound level from the plurality of audio scene capture apparatuses 100_1 to 100_N in order to determine which of the audio scene capture apparatuses 100 1 to 100_N are optimal for capturing/recording the audio scene.
  • the analysis of the values indicative of the average sound level may be performed on a sub group by sub group basis in which all audio scene capture apparatus whose identities have been assigned to a particular group may be processed by the device selection analysis processor 200. Processing of the values indicative of the average sound level (or average sound level information) may be based on the general principle that the closer the audio capture apparatus is to a sound source then the better the quality of the received signal. This may be reflected in the values indicative of the average sound level, where an audio scene capture apparatus which is nearer the sound source may have a higher average sound level than an average sound level produced by an audio capture apparatus which is further away from the sound source.
  • the device selection analysis processor 200 can collate the values indicative of the average sound level from each audio scene capture apparatus within a subgroup. This may be repeated on a subgroup by subgroup basis.
  • the values indicative of the average sound level for each audio scene capture apparatus can be collected over a period of time and then buffered in order to form a calculation window. This may be performed in order to smooth out any sudden transient changes in values indicative of the average sound level.
  • the device selection analysis processor 200 may analyse the values indicative of the average sound level for all audio scene capture apparatuses within a subgroup in order to determine which at least one audio scene capture apparatus is to be used to capture the audio scene signal.
  • the device selection analysis processor 200 may analyse the values indicative of the average sound level for each audio scene capture apparatus 100 within the sub group by determining a relative sound level of each audio scene capture apparatus 100 for each instance of time from the buffer.
  • the relative average sound level value of each audio scene capture apparatus 100 may be made relative to the value indicative of the sound level of the audio scene capture apparatus which is a maximum for all audio scene capture apparatuses in the same subgroup.
  • the determining of the relative average sound level may be performed on a time instance basis according to the number of time instances at which the values indicative of the average sound level were captured.
  • the value indicative of the average sound level value as measured at an audio scene capture apparatus is dependent on the position of said audio scene capture apparatus relative to the position of an audio source within the audio scene. Therefore the relative average sound level may be used to provide a measure of the relative position or distance between an audio scene capture apparatus and the source of an audio signal within an audio scene.
  • Sig m (t) is the relative average sound level for the device m at a time instance t
  • N is the number of audio scene capture apparatuses in the same subgroup
  • maxlX ⁇ t is the maximum average sound level across all audio scene capture apparatuses at the time instance t.
  • maxlX ⁇ t may be given as
  • Max (lX m (t)) for 0 ⁇ m ⁇ N, where Max is a function which returns a maximum value.
  • the measure of the position of an audio scene capture apparatus m relative to the position of the audio source in relation to the position of other audio scene capture apparatus in the same subgroup to the audio source averaged over the number of time sampling instances may be expressed as where Pos m is the measure of the position of the audio scene capture apparatus m relative to the position of the source of an audio signal of an audio scene for an audio scene capture apparatus m, and N t is the number of time instances over which the values indicative of the average sound level are buffered by the device selection processor 200.
  • the device selection processor 200 may comprise the means for determining a positional measure for the audio apparatus relative to the source of an audio signal which is part of an audio scene by determining for each of a plurality of instances in time a maximum sound level value by selecting a maximum value from the value indicative of the sound level associated with the audio signal and at least one further value indicative of the sound level associated with at least one further audio signal for each of the plurality of instances in time, and then determining a relative average sound level for each of the plurality of instances in time by normalising the value indicative of the sound level with the maximum sound level value for each of the plurality of instances in time, and summing the plurality of instances in time of the relative average sound level.
  • the step of determining the positional measure for an audio scene capture apparatus relative to the position of the audio source is shown as processing step 507 in Figure 5.
  • the device selection processor 200 may then be arranged to select a particular audio scene capture apparatus 100 from the same subgroup in order to capture/record the received audio signal of the audio scene.
  • the selection of the particular audio scene capture apparatus 100 for recording the audio scene maybe based on the value of the relative positional measure Pos m o ⁇ each audio scene capture apparatus within the same subgroup.
  • the particular audio scene capture apparatus for capturing ⁇ recording the audio signal of the audio scene may be selected by comparing the relative positional measure Pos m of each audio scene capture apparatus within the same subgroup against a threshold value thr, and selecting the audio scene capture apparatus which exceeds the threshold value.
  • the above audio scene capture apparatus selection step may be expressed as select device m , if Pos m > thr
  • the above selection step may be performed for each audio scene capture apparatus m, in other words the above selection step maybe repeated for all audio scene capture apparatuses in the same subgroup.
  • the threshold thr may be determined as the maximum relative positional measure maxPos within the subgroup (or group of audio scene capture apparatuses) minus the standard deviation of all the relative positional measures in the subgroup (or group) of audio scene capture apparatuses a Pos .
  • the above selection step may result in two or more audio scene capture apparatuses being identified as having a relative positional measure value which is greater than the determined threshold value.
  • the device selection analysis processor 200 may be arranged to select all the audio scene capture apparatuses within in the same subgroup which have a relative positional measure value greater than the threshold value for the task of capturing/recording the audio scene.
  • the device selection processor 200 may comprise the means for determining a threshold value by determining a maximum positional measure value from the positional measure for an audio apparatus relative to the source of an audio signal of the audio scene and at least one further positional measure for at least one further audio apparatus relative to the source of at least one further audio signal of the audio scene, and then determining the difference between the maximum positional measure value and the standard deviation of the positional measure and the at least further positional measure.
  • the device selection processor may be configured to perform an additional processing step to ensure a single optimal audio scene capture apparatus may be selected.
  • the additional processing step may take the form of discarding audio scene capture apparatuses from the selection process which lie within the same compass region of the audio scene, but have a lower relative positional measure value when compared to another audio scene capture apparatus in the same compass region.
  • a first audio scene capture apparatus and a second audio scene capture apparatus may share the same compass plane quadrant in the audio scene.
  • the relative positional measure value associated with the second audio scene capture apparatus is lower than the relative positional measure value with the first audio scene capture apparatus, the identity associated with the second audio scene capture apparatus may be removed from the audio scene capture device selection process for the subgroup.
  • the step of selecting an optimal audio scene capture apparatus from a subgroup for the purpose of capturing of the audio scene is shown as processing step 509 in Figure 5.
  • the device selection analysis processor 200 may be arranged to repeat the above processing steps 501 to 509 for all subgroups.
  • the device selection analysis processor 200 may select for each subgroup of identified audio scene apparatuses at least one audio scene capture apparatus for the capture/recording of the surrounding audio scene.
  • the selected at least one audio scene capture apparatus from each subgroup may then be instructed to capture/record the audio signal associated with the surrounding audio scene by the device selection analysis processor 200.
  • the selection of the audio scene capture apparatus is an ongoing and continual process, whereby the device selection analysis processor 200 may continually monitor the average sound level values from each audio scene capture apparatus within the subgroup.
  • the device selection analysis processor 200 may then repeat the above processing steps 501 to 509 periodically in order to determine if there is another audio scene capture apparatus from the subgroup that may be more optimal in terms of the above relative positional measure calculation. If it is determined at processing step 509 that another audio scene capture apparatus is more optimal for capturing/recording the audio scene.
  • the device selection processor 200 may then instruct the previous audio scene capture apparatus to stop capturing/recording the surrounding audio scene and instruct the newer selected audio scene capture apparatus to commence capturing/recording of the audio scene signal. As before this may be repeated for each subgroup.
  • processing steps 501 to 509 may be performed for the group of all audio scene apparatuses within the audio scene. In other words for these embodiments there is no partitioning or clustering of the audio scene capture apparatuses into sub groups.
  • an apparatus comprising means for receiving a value indicative of a sound level, in which the value indicative of the sound level is associated with an audio signal received by an audio apparatus; means for receiving at least one further value indicative of a sound level associated with at least one further audio signal received by at least one further audio apparatus; determining a positional measure for the audio apparatus, in which the positional measure is relative to the position of the source of the audio signal, also in which the positional measure is dependent on the value indicative of the sound level associated with the audio signal and the at least one further value indicative of the sound level associated with the at least one further audio signal; and instructing the audio apparatus to capture the audio signal dependent on whether the positional measure is greater than a threshold value.
  • the apparatus may comprise means for determining at least one further positional measure for the at least one further audio apparatus, in which the further positional measure is relative to the position of the source of the at least one further audio signal.
  • embodiments may also be applied to audio-video signals where the audio signal components of the recorded data are processed in terms of the determining of the base signal and the determination of the time alignment factors for the remaining signals and the video signal components may be synchronised using the above embodiments of the invention.
  • the video parts may be synchronised using the audio synchronisation information.
  • user equipment is intended to cover any suitable type of wireless user equipment, such as mobile telephones, portable data processing devices or portable web browsers.
  • PLMN public land mobile network
  • the various embodiments of the invention may be implemented in hardware or special purpose circuits, software, logic or any combination thereof.
  • some aspects may be implemented in hardware, while other aspects may be implemented in firmware or software which may be executed by a controller, microprocessor or other computing device, although the invention is not limited thereto.
  • firmware or software which may be executed by a controller, microprocessor or other computing device, although the invention is not limited thereto.
  • While various aspects of the invention may be illustrated and described as block diagrams, flow charts, or using some other pictorial representation, it is well understood that these blocks, apparatus, systems, techniques or methods described herein may be implemented in, as non-limiting examples, hardware, software, firmware, special purpose circuits or logic, general purpose hardware or controller or other computing devices, or some combination thereof.
  • the embodiments of this invention may be implemented by computer software executable by a data processor of the mobile device, such as in the processor entity, or by hardware, or by a combination of software and hardware.
  • any blocks of the logic flow as in the Figures may represent program steps, or interconnected logic circuits, blocks and functions, or a combination of program steps and logic circuits, blocks and functions.
  • the software may be stored on such physical media as memory chips, or memory blocks implemented within the processor, magnetic media such as hard disk or floppy disks, and optical media such as for example DVD and the data variants thereof, CD.
  • the memory may be of any type suitable to the local technical environment and may be implemented using any suitable data storage technology, such as semiconductor-based memory devices, magnetic memory devices and systems, optical memory devices and systems, fixed memory and removable memory.
  • the data processors may be of any type suitable to the local technical environment, and may include one or more of general purpose computers, special purpose computers, microprocessors, digital signal processors (DSPs), application specific integrated circuits (ASIC), gate level circuits and processors based on multi-core processor architecture, as non-limiting examples.
  • Embodiments of the inventions may be practiced in various components such as integrated circuit modules.
  • the design of integrated circuits is by and large a highly automated process. Complex and powerful software tools are available for converting a logic level design into a semiconductor circuit design ready to be etched and formed on a semiconductor substrate. Programs, such as those provided by Synopsys, Inc. of Mountain View, California and Cadence Design, of San Jose, California automatically route conductors and locate components on a semiconductor chip using well established rules of design as well as libraries of pre-stored design modules.
  • the resultant design in a standardized electronic format (e.g., Opus, GDSII, or the like) may be transmitted to a semiconductor fabrication facility or "fab" for fabrication.
  • a standardized electronic format e.g., Opus, GDSII, or the like

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • General Physics & Mathematics (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Telephone Function (AREA)

Abstract

La présente invention concerne entre autres un procédé consistant à : recevoir une valeur indiquant un niveau sonore associé à un signal audio reçu par un appareil audio; recevoir au moins une autre valeur indiquant un niveau sonore associé à au moins un autre signal audio reçu par au moins un autre appareil audio; déterminer une mesure de position pour l'appareil audio, la mesure de position étant relative à la position de la source du signal audio, et la mesure de position dépendant de la valeur indiquant le niveau sonore associé au signal audio et de ladite au moins une autre valeur indiquant le niveau sonore associé audit au moins un autre signal audio; et donner l'instruction à l'appareil audio de capturer le signal audio selon que la mesure de position est supérieure à une valeur seuil ou non.
PCT/FI2014/050951 2013-12-10 2014-12-03 Appareil de capture de scène audio WO2015086894A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GB1321765.8 2013-12-10
GB1321765.8A GB2521128A (en) 2013-12-10 2013-12-10 An audio scene capturing apparatus

Publications (1)

Publication Number Publication Date
WO2015086894A1 true WO2015086894A1 (fr) 2015-06-18

Family

ID=50000445

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/FI2014/050951 WO2015086894A1 (fr) 2013-12-10 2014-12-03 Appareil de capture de scène audio

Country Status (2)

Country Link
GB (1) GB2521128A (fr)
WO (1) WO2015086894A1 (fr)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011101708A1 (fr) * 2010-02-17 2011-08-25 Nokia Corporation Traitement de capture audio à l'aide de plusieurs dispositifs
WO2012098425A1 (fr) * 2011-01-17 2012-07-26 Nokia Corporation Appareil de traitement de scène audio
WO2013030623A1 (fr) * 2011-08-30 2013-03-07 Nokia Corporation Appareil de mappage de scène audio
US20130226324A1 (en) * 2010-09-27 2013-08-29 Nokia Corporation Audio scene apparatuses and methods

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4247002B2 (ja) * 2003-01-22 2009-04-02 富士通株式会社 マイクロホンアレイを用いた話者距離検出装置及び方法並びに当該装置を用いた音声入出力装置
US20150063070A1 (en) * 2012-02-09 2015-03-05 Nokia Corporation Estimating distances between devices

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011101708A1 (fr) * 2010-02-17 2011-08-25 Nokia Corporation Traitement de capture audio à l'aide de plusieurs dispositifs
US20130226324A1 (en) * 2010-09-27 2013-08-29 Nokia Corporation Audio scene apparatuses and methods
WO2012098425A1 (fr) * 2011-01-17 2012-07-26 Nokia Corporation Appareil de traitement de scène audio
WO2013030623A1 (fr) * 2011-08-30 2013-03-07 Nokia Corporation Appareil de mappage de scène audio

Also Published As

Publication number Publication date
GB2521128A (en) 2015-06-17
GB201321765D0 (en) 2014-01-22

Similar Documents

Publication Publication Date Title
US9820037B2 (en) Audio capture apparatus
US20130304244A1 (en) Audio alignment apparatus
US20160155455A1 (en) A shared audio scene apparatus
US20130226324A1 (en) Audio scene apparatuses and methods
US20130297053A1 (en) Audio scene processing apparatus
WO2013088208A1 (fr) Appareil d'alignement de scène audio
US9729993B2 (en) Apparatus and method for reproducing recorded audio with correct spatial directionality
US9195740B2 (en) Audio scene selection apparatus
US20150310869A1 (en) Apparatus aligning audio signals in a shared audio scene
US20150271599A1 (en) Shared audio scene apparatus
US9288599B2 (en) Audio scene mapping apparatus
US9392363B2 (en) Audio scene mapping apparatus
US20150302892A1 (en) A shared audio scene apparatus
US20130226322A1 (en) Audio scene apparatus
WO2015086894A1 (fr) Appareil de capture de scène audio
WO2015028715A1 (fr) Appareil audio directionnel
WO2014016645A1 (fr) Appareil de scène audio partagée
GB2536203A (en) An apparatus

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14870035

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14870035

Country of ref document: EP

Kind code of ref document: A1