NO323434B1

NO323434B1 - System and method for producing a selective audio output signal

Info

Publication number: NO323434B1
Application number: NO20054527A
Authority: NO
Inventors: Ines Hafizovic; Vibeke Jahr; Morgan Kjolerbakken
Original assignee: Squarehead System As
Priority date: 2005-09-30
Filing date: 2005-09-30
Publication date: 2007-04-30
Also published as: PT1946606E; NO20054527D0; CN101278596B; ES2355271T3; ATE487333T1; DE602006018050D1; DK1946606T3; CN101278596A

Abstract

Metode og system for digital direktiv fokusering og styring av samplet lyd innenfor et målområde for å produsere et selektivt lydutgangssignal tilhørende video. I en foretrukket utførelse er metoden og systemet karakterisert ved å motta posisjon og fokusdata fra ett eller flere kameraer som filmer en hendelse, og bruke disse inngangsdata for generering av relevant lyd sammen med bildet.Method and system for digital direct focus and control of sampled audio within a target area to produce a selective audio output signal associated with video. In a preferred embodiment, the method and system are characterized by receiving position and focus data from one or more cameras filming an event, and using this input data to generate relevant sound along with the image.

Description

Introduksjon Introduction

Den foreliggende oppfinnelsen omhandler retningsrettet lydopptak og mer spesifikt til en metode og system for å produsere selektiv lyd i en videoproduksjon, for derved å muliggjøre kringkasting med kontrollert styring og zoomefunksjonalitet. The present invention relates to directional sound recording and more specifically to a method and system for producing selective sound in a video production, thereby enabling broadcasting with controlled control and zoom functionality.

Systemet er anvendelig for å fange inn lyd under støyende forhold hvor romlig filtrering er nødvendig, f.eks. å fange inn lyd fra friidrettsutøvere, dommere og trenere under sportshendelser for kringkastingsproduksjon. The system is applicable for capturing sound in noisy conditions where spatial filtering is necessary, e.g. to capture audio from athletes, referees and coaches during sporting events for broadcast production.

Systemet omfatter én eller flere mikrofonmatriser, én eller flere samplingsenheter, lagringsmidler, og kontroll- og signalprosesseringsenhet med inngangsmidler for å motta posisjoneringsdata. The system comprises one or more microphone arrays, one or more sampling units, storage means, and control and signal processing unit with input means for receiving positioning data.

Tidligere kjent teknikk Prior art

En mikrofonmatrise er en flerkanals akustisk tilegnelsesoppsett som omfatter to eller flere lydtrykksensorer lokalisert ved ulike lokasjoner i rommet for å kunne romlig sample lydtrykket fra én eller flere kilder. Signalprosesseringsteknikker kan bli brukt for å kontrollere, eller mer spesifikt å styre, mikrofonmatriser mot enhver kilde av interesse. Teknikkene som kan brukes kan være: forsinkelse av signaler, filtrering, vekting, og summere opp signaler fra nukrofonelementene for å oppnå den ønskede romlige selektiviteten. Dette blir det referert til som stråleforming. Mikrofoner i en kontrollerbar mikrofonmatrise bør være vel tilpasset med hensyn til amplitude og fase. Dersom ikke dette er tilfelle må forskjellene være kjent for å kunne utføre feilkorrigering i software og/eller hardware. Prinsippene bak styring av en matrise er velkjent fra relevant signalprosesseringslitteratur. Mikrofonmatrisene kan være rektangulære, sirkulære, eller i tre dimensjoner. A microphone array is a multi-channel acoustic acquisition setup that comprises two or more sound pressure sensors located at different locations in the room to be able to spatially sample the sound pressure from one or more sources. Signal processing techniques can be used to control, or more specifically to steer, microphone arrays towards any source of interest. The techniques that can be used can be: delaying signals, filtering, weighting, and summing up signals from the nucrophone elements to achieve the desired spatial selectivity. This is referred to as beamforming. Microphones in a controllable microphone array should be well matched with regard to amplitude and phase. If this is not the case, the differences must be known in order to carry out error correction in software and/or hardware. The principles behind controlling a matrix are well known from the relevant signal processing literature. The microphone arrays can be rectangular, circular, or in three dimensions.

Det finnes flere kjente systemer som omhandler mikrofonmatriser. Majoriteten av disse har hovedfokus på signalprosessering for optimalisering av samplede signaler og/eller tolkning av posisjonen til objekter eller elementer i bildet. There are several known systems that deal with microphone arrays. The majority of these have a main focus on signal processing for optimization of sampled signals and/or interpretation of the position of objects or elements in the image.

Den mest relevante tidligere kjente teknikken er beskrevet i det følgende. The most relevant prior art is described below.

US 5 940 118 beskriver et system og en metode for å styre retningsrettede mikrofoner. Systemet er tiltenkt brukt i konferansesystemer som inneholder tilhørere. Det omfatter optiske inngangsmidler, dvs. kamera og tolkningsmidler for å tolke hvilken tilhører som snakker, og midler for å aktivere lyden mot lydkilden. US 5 940 118 describes a system and method for controlling directional microphones. The system is intended for use in conference systems that contain audience members. It includes optical input means, i.e. camera and interpretation means to interpret which listener is speaking, and means to activate the sound towards the sound source.

US 6 469 732 beskriver en apparatur og en metode brukt i et videokonferansesystem for å tilveiebringe nøyaktig bestemmelse av posisjonen til en snakkende deltaker. US 6,469,732 describes an apparatus and method used in a video conferencing system to provide accurate determination of the position of a speaking participant.

JP 2004 180197 beskriver en mikrofonmatrise som kan bli digitalt kontrollert med hensyn til akustisk fokus. JP 2004 180197 describes a microphone array which can be digitally controlled with respect to acoustic focus.

Den foreliggende oppfinnelsen er en metode og et system for å kontrollere fokusering og styring av lyd som skal presenteres sammen med video. Oppfinnelsen skiller seg fra tidligere kjent teknikk med sin fleksibilitet og enkle bruk. The present invention is a method and system for controlling the focusing and steering of audio to be presented together with video. The invention differs from prior art with its flexibility and ease of use.

I en foretrukket utførelse er oppfinnelsen en metode og system for å motta posisjon og fokuseringsdata fra ett eller flere kameraer som filmer en hendelse, og bruke disse inngangsdataene for å generere relevant lyd sammen med videoen. In a preferred embodiment, the invention is a method and system for receiving position and focus data from one or more cameras filming an event, and using this input data to generate relevant audio along with the video.

I en annen utførelse kan en bruker mate inn ønsket lokasjon som det skal plukkes opp lyd fra, og signalprosesseringsmidler vil bruke dette for å utføre den nødvendige signalprosesseringen. In another embodiment, a user can input the desired location from which sound is to be picked up, and signal processing means will use this to perform the necessary signal processing.

I ennå en annen utførelse, kan posisjoneringsdataene fra lokasjonen som det skal plukkes opp lyd fra bli sendt fra et system som omfatter antenne(r) som plukker opp radiosignaler fra radiosender(e) plassert på eller i objekt(er) som skal spores, sammen med midler for å utlede lokasjonen og sende denne informasjonen til systemet i henhold til den foreliggende oppfinnelsen. Radiosenderen kan f.eks. bli plassert i en fotball, for derved å muliggjøre systemet å registrere lyd fra lokasjonen til fotballen, og også kontrollere én eller flere kameraer slik at både video og lyd vil bli fokusert på lokasjonen til ballen. In yet another embodiment, the positioning data from the location from which sound is to be picked up may be transmitted from a system comprising antenna(s) that pick up radio signals from radio transmitter(s) located on or in object(s) to be tracked, together with means for deriving the location and sending this information to the system according to the present invention. The radio transmitter can e.g. be placed in a football, thereby enabling the system to record sound from the location of the football, and also control one or more cameras so that both video and sound will be focused on the location of the ball.

Sammendrag av oppfinnelsen Summary of the invention

Hensikten med den foreliggende oppfinnelsen er å tilveiebringe selektiv lyd med hensyn til relevant(e) målområde(r). The purpose of the present invention is to provide selective sound with respect to the relevant target area(s).

Hensikten blir oppnådd ved et system for digital direktiv fokusering og styring av samplet lyd innenfor målområdet for å produsere selektiv lyd. Systemet omfatter én eller flere bredbåndsmatriser av mikrofoner, A/D-signalkonverterende enhet og kontrollenhet. The purpose is achieved by a system of digital directive focusing and control of sampled sound within the target area to produce selective sound. The system comprises one or more broadband arrays of microphones, A/D signal converting unit and control unit.

Systemet er karakterisert ved at kontrollenheten omfatter mottakermidler for å motta digitale signaler av innfanget lyd fra alle mikrofonene omfattet av systemet, inngangsmidler for å motta instruksjoner omfattende selektive posisjonsdata, signalprosesseringsmidler for å velge signaler fra et utvalg av relevante mikrofoner i matrisen(e) for videre prosessering, The system is characterized in that the control unit comprises receiver means for receiving digital signals of captured sound from all the microphones included in the system, input means for receiving instructions comprising selective position data, signal processing means for selecting signals from a selection of relevant microphones in the array(s) for further processing,

signalprosesserende midler for å utføre signalprosessering på signalene fra utvalget av relevante mikrofoner for å fokusere og styre lyden i henhold til de mottatte instruksjonene, og signal processing means for performing signal processing on the signals from the selection of relevant microphones to focus and control the sound according to the received instructions, and

signalprosesserende midler for å generere et utvalg lyder i henhold til mottatte instruksjoner og utføre signalprosessering. signal processing means for generating a selection of sounds according to received instructions and performing signal processing.

Hensikten med oppfinnelsen blir videre oppnådd ved en metode for digital direktiv fokusering og styring av samplet lyd innenfor et målområde for å produsere et selektivt lydutgangssignal, hvor metoden omfatter bruk av én eller flere bredbåndsmatriser av mikrofoner, en A/D-signalkonverterende enhet, og en kontrollenhet. The purpose of the invention is further achieved by a method for digital directive focusing and control of sampled sound within a target area to produce a selective audio output signal, the method comprising the use of one or more broadband arrays of microphones, an A/D signal converting unit, and a control unit.

Metoden er karakterisert ved at den omfatter de følgende trinnene utført av kontrollenheten: - å motta digitale signaler av innfanget lyd fra alle mikrofonene omfattet i systemet; - å motta instruksjoner omfattende selektive posisjonsdata gjennom inngangsmidlene i kontrollenheten; - å velge signaler fra et utvalg av relevante mikrofoner i bredbåndsmatrisen(e) for videre prosessering, og hvor valget utført er basert på spektralanalyse av signalet; - å utføre signalprosessering på signalene fra utvalget av relevante mikrofoner for å fokusere og styre lyden i henhold til de mottatte instruksjonene; - å generere én eller flere selektive lyder i henhold til den utførte prosesseringen. The method is characterized by the fact that it comprises the following steps carried out by the control unit: - receiving digital signals of captured sound from all the microphones included in the system; - receiving instructions comprising selective position data through the input means in the control unit; - selecting signals from a selection of relevant microphones in the broadband array(s) for further processing, and where the selection performed is based on spectral analysis of the signal; - to perform signal processing on the signals from the selection of relevant microphones to focus and control the sound according to the received instructions; - to generate one or more selective sounds according to the processing performed.

Et hovedtrekk ved oppfinnelsen er at selektive posisjonsdata kan bli tilveiebrakt i sanntid eller i en prosesseringsprosess av lagret lyd i ettertid. Fokusområdet/områdene som det skal produseres lyd fra kan defineres av en sluttbruker som gir/mater inn instruksjoner om området/områdene eller ved posisjonering og fokusering av ett eller flere kameraer. A main feature of the invention is that selective position data can be provided in real time or in a processing process of stored sound afterwards. The focus area(s) from which sound is to be produced can be defined by an end user who gives/enters instructions about the area(s) or by positioning and focusing one or more cameras.

Hensiktene med oppfinnelsen blir oppnådd ved midler og metoder som fremsatt i det vedlagte kravsett. The purposes of the invention are achieved by means and methods as stated in the attached set of claims.

Kort beskrivelse av tegningene Brief description of the drawings

Oppfinnelsen vil bli beskrevet mer detaljert med henvisning til tegningene hvor: Fig. 1 viser en oversikt over ulike systemkomponenter integrert med kameraer. Fig. 2 viser et oppsett som kan tilveiebringe lyd fra ulike lokasjoner til et surroundsystem, avhengig av kameraene som blir brukt. Fig. 3 viser eksempler på frekvensoptimalisering med romlig filtrering i matrisedesignet. The invention will be described in more detail with reference to the drawings where: Fig. 1 shows an overview of various system components integrated with cameras. Fig. 2 shows a setup that can provide sound from different locations to a surround system, depending on the cameras that are used. Fig. 3 shows examples of frequency optimization with spatial filtering in the matrix design.

Detaljert beskrivelse av foretrukne utførelser Detailed description of preferred designs

Fig. 1 viser en oversikt over ulike systemkomponenter integrert med kameraer. Fig. 1 shows an overview of various system components integrated with cameras.

Komponentene vist i tegningen er bredbåndsmikrofonmatriser 100,110 som skal posisjoneres tilliggende til området som det skal registreres lyd fra. Analoge signaler fra hver mikrofon blir konvertert til et digitalt signal i en A/D-konverter 210 omfattet i en A/D-enhet 200. A/D-enheten kan også ha minnemidler 220 for å lagre digitale signaler, og overføringsmidler 230 for å overføre digitale signaler til en kontrollenhet 300. The components shown in the drawing are broadband microphone arrays 100,110 which are to be positioned adjacent to the area from which sound is to be recorded. Analog signals from each microphone are converted to a digital signal in an A/D converter 210 comprised in an A/D unit 200. The A/D unit may also have memory means 220 for storing digital signals, and transmission means 230 for transmit digital signals to a control unit 300.

Kontrollenheten 300 kan være lokalisert ved en fjerntliggende lokasjon og motta digitale signaler fra registrert lyd over et kablet eller trådløst nettverk, f.eks. gjennom kabel eller satellitt, noe som gjør at sluttbrukeren kan gjøre all styring og fokuseringssignalprosessering lokalt. Kontrollenheten 300 omfatter en datamottaker 310 for å motta digitale lydsignaler fra A/D-enheten 200. Den omfatter videre datalagringsmidler 320 for å lagre mottatte signaler, signalprosesseringsmidler 330 for sanntids eller prosessering i ettertid, og lydgenereringsmidler 340 for å generere en selektiv lyd. Før lagring av signalet i datalageret, kan signalet bli konvertert til et komprimert format for å spare plass. The control unit 300 can be located at a remote location and receive digital signals from recorded sound over a wired or wireless network, e.g. through cable or satellite, allowing the end user to do all control and focus signal processing locally. The control unit 300 comprises a data receiver 310 for receiving digital audio signals from the A/D unit 200. It further comprises data storage means 320 for storing received signals, signal processing means 330 for real-time or subsequent processing, and sound generating means 340 for generating a selective sound. Before storing the signal in the data store, the signal can be converted to a compressed format to save space.

Kontrollenheten 300 omfatter videre inngangsmidler 350 for å motta instruksjoner omfattende selektive posisjonsdata. Disse instruksjonene er typisk koordinater som definerer posisjon og fokuseringspunkt til ett eller flere kamera(er) som filmer en hendelse som foregår ved en spesifikk lokasjon(er) innenfor målområdet. The control unit 300 further comprises input means 350 for receiving instructions comprising selective position data. These instructions are typically coordinates that define the position and focus point of one or more camera(s) that film an event that takes place at a specific location(s) within the target area.

I en første utførelse kan koordinatene til lydkilden bli tilveiebrakt av fokuseringspunktet til kamera(er) 150, 160 og fra azimuth og høyde til kamerastativ(er). Ved å forbinde systemet til ett eller flere fjernsynskameraer og motta posisjonskoordinater i to eller tre dimensjoner (azimuth, høyde og avstand), er det mulig å styre og fokusere lyden i henhold til fokuseringspunktet til kameralinsen. In a first embodiment, the coordinates of the sound source can be provided by the focus point of the camera(s) 150, 160 and from the azimuth and elevation of the camera stand(s). By connecting the system to one or more television cameras and receiving position coordinates in two or three dimensions (azimuth, elevation and distance), it is possible to control and focus the sound according to the focus point of the camera lens.

I en andre utførelse kan koordinatene og dermed lokaliseringen av lydkilden bli tilveiebrakt av en operatør som opererer et grafisk brukergrensesnitt (GUI), som viser en oversikt over målområdet, et tastatur, en lydmiksingsenhet, og én eller flere joysticker. GUI tilveiebringer operatøren med informasjon om hvor det skal styres og zoomes. In a second embodiment, the coordinates and thus the location of the sound source can be provided by an operator operating a graphical user interface (GUI), which displays an overview of the target area, a keyboard, a sound mixing unit, and one or more joysticks. The GUI provides the operator with information about where to steer and zoom.

GUI kan vise direktesendt video fra ett eller flere tilkoblede kameraer (multiple kanaler). I en foretrukket utførelse, blir tilleggsgrafikk lagt til GUI for å peke på hvor systemet styrer. Dette forenkler bruken av systemet og gir operatøren full kontroll over zoome- og styrefunksjon. The GUI can display live video from one or more connected cameras (multiple channels). In a preferred embodiment, additional graphics are added to the GUI to indicate where the system is controlling. This simplifies the use of the system and gives the operator full control over the zoom and control functions.

I en tredje utførelse, kan systemet bruke algoritmer for å finne forhåndsdefinerte lydkilder. F.eks. kan systemet settes opp for å lytte på en dommer sin fløyte og så styre fokus av lyd og video til denne lokasjonen. In a third embodiment, the system may use algorithms to find predefined sound sources. E.g. the system can be set up to listen to a referee's whistle and then control the focus of audio and video to this location.

I ennå en utførelse kan lokasjonen eller koordinatene bli tilveiebrakt av systemet ved å spore lokasjonen til et objekt, f.eks. en fotball som spilles i et spilleområde. In yet another embodiment, the location or coordinates may be provided by the system by tracking the location of an object, e.g. a football played in a playing area.

En kombinasjon av de ovenfor nevnte utførelsene kan også være et mulig alternativ. A combination of the above-mentioned designs can also be a possible alternative.

For at lyden til et fokusområde til kamera(ene) skal være synkronisert, trenger systemet å ha et felles koordinatsystem. Koordinatene fra kameraene vil bli kalibrert relativt til et referansepunkt felles for systemet og kameraene. In order for the sound of a focus area of the camera(s) to be synchronized, the system needs to have a common coordinate system. The coordinates from the cameras will be calibrated relative to a reference point common to the system and the cameras.

Systemet kan fange inn lyd fra flere ulike lokasjoner samtidig (flerkanalsfunksjonalitet) og tilveiebringe lyd til et surroundsystem. Lokasjonene kan være forhåndsdefinert for hvert kamera eller forandres dynamisk i sanntid i henhold til kameraposisjon, fokus og vinkel. The system can capture sound from several different locations at the same time (multi-channel functionality) and provide sound for a surround system. The locations can be predefined for each camera or change dynamically in real time according to camera position, focus and angle.

Selektiv lyd blir oppnådd ved å kombinere lydsignaler og posisjonsdata og kan utføre nødvendig signalprosessering i signalprosessoren. Selective sound is achieved by combining audio signals and position data and can perform the necessary signal processing in the signal processor.

Sampling av signalene fra mikrofonene kan utføres samtidig for alle mikrofonene eller multiplekses ved å multiplekse signaler fra mikrofonene før analog til digital konvertering. Sampling of the signals from the microphones can be carried out simultaneously for all the microphones or multiplexed by multiplexing signals from the microphones before analogue to digital conversion.

Signalprosesseringen omfatter romlig og spektral stråleforming og beregning av signal forsinkelse grunnet multiplekset sampling, for å utføre korreksjoner i software eller hardware. The signal processing includes spatial and spectral beamforming and calculation of signal delay due to multiplexed sampling, in order to perform corrections in software or hardware.

Signalprosesseringen omfatter videre beregning av lydtrykkforsinkelse fra lydmålet til matrisen av mikrofoner med det formål å utføre synkronisering av signalet med en forhåndsdefinert tidsforsinkelse. The signal processing further comprises the calculation of sound pressure delay from the sound target to the array of microphones with the aim of performing synchronization of the signal with a predefined time delay.

Signalprosesseringen omfatter regulering av samplingsrate på valgte mikrofonelementer for å oppnå optimal signalsampling og prosessering. The signal processing includes regulation of the sampling rate on selected microphone elements to achieve optimal signal sampling and processing.

Signalprosesseringen muliggjør dynamisk selektiv lyd med panorering, tilting og zooming av lyden til én eller flere lokasjoner samtidig og også å tilveiebringe lyd til én eller flere kanaler innbefattende surroundsystemer. The signal processing enables dynamic selective sound with panning, tilting and zooming of the sound to one or more locations simultaneously and also to provide sound to one or more channels including surround systems.

Signalprosesseringen tilveiebringer også variabel samplingsfrekvens (Fs). Fs på mikrofonelementer som er aktive ved høye frekvenser er høyere enn på elementer som er aktive ved lave frekvenser. Fs basert på spekteret av signaler og Rayleigh-kriteriet (samplingsrate som er i det minste to ganger så høy som signalfrekvensen) gir optimal signalsampling og prosessering, og tilveiebringer mindre mengder av data som skal lagres og prosesseres. The signal processing also provides variable sampling frequency (Fs). Fs on microphone elements active at high frequencies is higher than on elements active at low frequencies. Fs based on the spectrum of signals and the Rayleigh criterion (sampling rate at least twice the signal frequency) provides optimal signal sampling and processing, providing smaller amounts of data to be stored and processed.

Signalprosessering omfatter forandring av aperture til mikrofonmatrisen for å oppnå en gitt frekvensrespons og redusere antall aktive elementer i mikrofonmatrisen. Signal processing includes changing the aperture of the microphone array to achieve a given frequency response and reduce the number of active elements in the microphone array.

Fokuseringspunkt(ene) bestemmer hvilke romlige vektefunksjoner som skal brukes for å justere graden av romlig stråleforming med fokusering og styring med forsinkelse og summering av stråleformer, og forandring av sidelobenivå og strålebredden. The focus point(s) determine which spatial weighting functions will be used to adjust the degree of spatial beamforming with focusing and steering with delay and summation of beamforms, and changing sidelobe level and beamwidth.

Romlig stråleforming blir utført ved å velge en vektefunksjon blant Cosin, Kaiser, Hamming, Hannig, Blackmann-Harris og Prolate Spheroidal i henhold til en valgt strålebredde til hovedloben. Spatial beamforming is performed by selecting a weighting function from among Cosine, Kaiser, Hamming, Hannig, Blackmann-Harris and Prolate Spheroidal according to a selected main lobe beamwidth.

Systemet sampler akustisk lydtrykk fra alle elementene, eller et utvalg av elementer i alle matrisene og lagrer dataene i en lagringsenhet. Samplingen kan utføres samtidig for alle kanalene eller multiplekses. Siden hele lydfeltet blir samplet og lagret, kan all styre- og zoomesignalprosessering for lyden, i tillegg til sanntidsprosessering, bli utført som etterprosessering (gå tilbake i tid og trekke ut lyd fra en hvilken som helst lokasjon). Etterprosessering av lagrede data tilbyr samme funksjonalitet som sanntidsprosessering og en operatør kan tilveiebringe lyd fra enhver ønsket lokasjon som systemet skal dekke. The system samples acoustic sound pressure from all the elements, or a selection of elements in all the arrays and stores the data in a storage device. The sampling can be performed simultaneously for all channels or multiplexed. Since the entire sound field is sampled and stored, all control and zoom signal processing for the sound, in addition to real-time processing, can be done as post-processing (go back in time and extract sound from any location). Post-processing of stored data offers the same functionality as real-time processing and an operator can provide sound from any desired location that the system will cover.

Siden det er svært viktig å tilveiebringe synkronisering med eksternt lyd- og videoutstyr, er systemet i stand til å estimere og kompensere for forsinkelse av lydsignal grunnet utbredelsestid til signalet fra lydkilden til mikrofonmatrisen(e). Operatøren vil sette maksimalt nødvendig område som systemet trenger å dekke, og maksimal tidsforsinkelse vil automatisk bli beregnet. Dette kan være utgangsforsinkelsen av systemet og all lyd ut av systemet vil ha denne forsinkelsen. Since it is very important to provide synchronization with external audio and video equipment, the system is able to estimate and compensate for audio signal delay due to the propagation time of the signal from the audio source to the microphone array(s). The operator will set the maximum required area that the system needs to cover, and the maximum time delay will be automatically calculated. This can be the output delay of the system and all sound out of the system will have this delay.

Ved å implementere ulike sensorer, kan systemet korrigere for feil i lydutbredelse grunnet temperaturgradienter, fuktighet i mediet (luft) og bevegelser i mediet forårsaket av vind og utveksling av varm eller kald luft. By implementing various sensors, the system can correct for errors in sound propagation due to temperature gradients, humidity in the medium (air) and movements in the medium caused by wind and exchange of hot or cold air.

Fig. 2 viser et oppsett som kan tilveiebringe lyd fra ulike lokasjoner til et surroundsystem, avhengig av kameraene som blir brukt. Figuren viser et spilleområde 400 med en matrise av mikrofoner 100 lokalisert i midten og over spilleområdet 400. Figuren viser videre et kamera 150 som dekker den korteste siden av spilleområdet 400, og et annet kamera 160 som dekker den lengste siden av spilleområdet 400. Fig. 2 shows a setup that can provide sound from different locations to a surround system, depending on the cameras that are used. The figure shows a playing area 400 with a matrix of microphones 100 located in the middle and above the playing area 400. The figure further shows a camera 150 that covers the shortest side of the playing area 400, and another camera 160 that covers the longest side of the playing area 400.

Ved å bruke dette oppsettet, kan den foreliggende oppfinnelsen tilveiebringe relevant lyd fra et flertall kanaler (CH1-CH4) til scenen som dekkes av hvert kamera. Using this setup, the present invention can provide relevant audio from a plurality of channels (CH1-CH4) to the scene covered by each camera.

Ved å motta lokasjonsinformasjon fra et system som omfatter en radiosender, plassert i en ball som blir spilt i et spillfelt, og antenne(er) for å plukke opp radiosignalene, er det mulig å ha et system som alltid plukker opp lyden fra lokasjonen hvor handlingen er, og f.eks. la denne lyden være representert i senterkanelen i et surroundsystem. By receiving location information from a system that includes a radio transmitter, placed in a ball being played in a playing field, and antenna(s) to pick up the radio signals, it is possible to have a system that always picks up the sound from the location where the action is is, and e.g. let this sound be represented in the center channel of a surround system.

Fig. 3 viser eksempler på forandring av aperture for frekvensoptimalisering med romlig filtrering i matrisedesignet. Fig. 3 shows examples of changing the aperture for frequency optimization with spatial filtering in the matrix design.

Systemet kan dynamisk forandre aperturen til matrisen for å tilegne en optimalisert stråle i henhold til ønsket strålebredde, frekvensrespons og matriseforsterkning. Dette kan oppnåes kun ved prosessering av data fra valgte matriseelementer og på denne måten kan systemet redusere nødvendig mengde av signalprosessering. The system can dynamically change the aperture of the matrix to acquire an optimized beam according to the desired beam width, frequency response and matrix gain. This can only be achieved by processing data from selected matrix elements and in this way the system can reduce the required amount of signal processing.

Svarte prikker angir aktive mikrofonelementer og hvite prikker angir passive mikrofonelementer. Black dots indicate active microphone elements and white dots indicate passive microphone elements.

A viser en mikrofonmatrise med alle mikrofonelementene aktive. Denne konfigurasjonen vil gi den beste responsen og direktiviteten for alle spektra som matrisen vil dekke. A shows a microphone array with all microphone elements active. This configuration will provide the best response and directivity for all spectra that the array will cover.

B viser en høyfrekvensoptimalisert tynnet matrise som kan bli brukt når det ikke er noen lavfrekvenslyd tilstede eller når ingen romlig filtrering for lave frekvenser er nødvendig. B shows a high-frequency optimized thinned array that can be used when no low-frequency sound is present or when no low-frequency spatial filtering is required.

C viser en midtfrekvensoptimalisert tynnet matrise som kan bli brukt når det ikke er noen lav- eller høyfrekvenslyd tilstede eller når ingen romlig filtrering for lav eller høyfrekvenser er ønskelige, f.eks. når kun normal tale er tilstede. C shows a mid-frequency optimized thinned matrix which can be used when there is no low or high frequency sound present or when no spatial filtering for low or high frequencies is desired, e.g. when only normal speech is present.

D viser en lavfrekvensoptimalisert tynnet matrise som kan bli brukt når det ikke er noen høyfrekvenslyd tilstede eller når ingen romlig filtrering for høye frekvenser er nødvendig. D shows a low-frequency optimized thinned array that can be used when there is no high-frequency sound present or when no high-frequency spatial filtering is required.

Flere tilpasninger av systemet er mulig, for derved å muliggjøre ulike måter å bruke systemet på. Signalprosesseringen, og dermed den endelige utgangslyden kan bli prosessert lokalt, eller ved en fjerntliggende lokasjon. Several adaptations of the system are possible, thereby enabling different ways of using the system. The signal processing, and thus the final output sound, can be processed locally, or at a remote location.

Ved å muliggjøre signalprosessering ved en fjerntliggende lokasjon er det mulig for en sluttbruker, som f.eks. ser på en sportshendelse på TV, å kontrollere lokasjonene som lyden skal mottas fra. Signalprosesseringsmidler kan være lokalisert hos sluttbrukeren, og brukeren kan mate inn lokasjonene han eller hun ønsker å motta lyd fra. En innmatingsinnretning for å mate inn lokasjoner kan f.eks. være mus eller joystick som kontrollerer en markør på skjermen hvor sportshendelsen blir fremvist. Signalprosesserende midler 300 med sine utgangs- og inngangsmidler 340, 350 kan så bli implementert i en TV-boks (eng.: set-top box). By enabling signal processing at a remote location, it is possible for an end user, such as watching a sporting event on TV, controlling the locations from which the sound will be received. Signal processing means can be located at the end user, and the user can enter the locations from which he or she wishes to receive sound. An input device for inputting locations can e.g. be a mouse or joystick that controls a cursor on the screen where the sporting event is displayed. Signal processing means 300 with their output and input means 340, 350 can then be implemented in a TV box (eng.: set-top box).

Alternativt kan sluttbrukeren sende posisjonsdata til signalprosesserende midler lokalisert ved en annen lokasjon enn den til sluttbrukeren, og deretter motta den prosesserte og styrte lyden fra relevant posisjon(er). Alternatively, the end user can send position data to signal processing means located at a different location than that of the end user, and then receive the processed and controlled sound from the relevant position(s).

Claims

1. System for digital directive focusing and control of sampled sound within a target area (400) to produce a selective audio output signal, comprising one or more broadband arrays of microphones (100,110), A/D signal converting unit (200) and control unit (300) , and where the system is characterized in that the control unit (300) comprises: - receiver means (310) for receiving digital signals of captured sound from all the microphones included in the system; - input means (350) for receiving instructions comprising selective position data; - signal processing means (330) for selecting signals from a selection of relevant microphones in the array(s) (100, 110) for further processing; - signal processing means (330) for performing signal processing on the signals from the selection of relevant microphones to focus and control the sound according to the received instructions, and - signal processing means (330) for generating a selection of sounds according to the received instructions and performing signal processing.

2. System according to claim 1, characterized in that the control unit (300) is located at a remote location and comprises means (310) for receiving digital signals of captured sound over a wired or wireless network.

3. System according to claim 1, characterized in that the input means (350) in the control unit (300) comprise means for receiving selective position data over a wired or wireless network.

4. System according to claim 1, characterized in that the control unit (300) further comprises data storage means (320) for storing received digital signals of captured sound.

5. System according to claim 1, characterized in that the control unit (300) performs signal processing on several channels based on one or more different input coordinates.

6. System according to claim 1, characterized in that the control unit (300) comprises means for changing the aperture of the microphone array(s) (100, 110) based on the spectral components of the incoming sound.

7. System according to claim 4, characterized in that the control unit (300) further comprises means for converting the received signals into a compressed format before they are stored in the storage means (320).

8. System according to claim 1, characterized in that the control unit (300) further comprises means for controlling and focusing one or more cameras based on received instructions comprising selective position data.

9. Method for digital directive focusing and control of sampled sound within a target area (400) to produce a selective audio output signal, the method comprising using one or more broadband arrays of microphones (100, 110), an A/D signal converting unit ( 200), and a control unit (300), characterized in that the method includes the following steps performed by the control unit (300): - receiving digital signals of captured sound from all the microphones included in the system; • receiving instructions comprising selective position data through the input means (350) in the control unit (300); - selecting signals from a selection of relevant microphones in the broadband matrix(es) (100, 110) for further processing, and where the selection performed is based on spectral analysis of the signal; - to perform signal processing on the signals from the selection of relevant microphones to focus and control the sound according to the received instructions; - to generate one or more selective sounds according to the processing performed.

10. Method according to claim 9, characterized in that received digital signals are in a compressed format.

11. Method according to claim 9, characterized in that received digital signals of captured sound from all the microphones in the array(s) (100, 110) are stored in a data store (320).

12. Method according to claim 9, characterized in that the signal processing unit (300) performs the signal processing in real time.

13. Method according to claims 9 and 11, characterized in that the signal processing unit (300) performs signal processing in a subsequent processing process by using stored signals of captured sound.

14. Method according to claim 9, characterized in that the signal processing includes spatial and spectral beamforming.

15. Method according to claim 9, characterized in that the signal processing includes multiplexed sampling and calculation of signal delay, due to multiplexing, in order to perform corrections in software or hardware.

16. Method according to claim 9, characterized in that the signal processing comprises calculation of sound pressure delay from the sound target to the array of microphones with the intention of synchronizing the signal with a predefined time delay.

17. Method according to claim 9, characterized in that the signal processing enables dynamic selective audio output signal for zooming and panning the sound to one or more locations simultaneously and also for providing sound in one or more channels including surround systems.

18. Method according to claim 9, characterized in that the signal processing includes regulation of the sampling rate on selected microphone elements to achieve optimal signal sampling and processing.

19. Method according to claim 9, characterized by changing the aperture of the microphone array to achieve a given frequency response and reduce the number of active elements in the microphone array.

20. Method according to claim 9, characterized in that received selective position data comprises coordinates in two or three dimensions to define focusing point(s).

21. Method according to claim 20, characterized in that received selective position data comes from a system that tracks one or more objects.

22. Method according to claims 14 and 20, characterized by position data determining which spatial weight functions are to be used to adjust the degree of spatial beam shaping with focusing and control with delay and summation of beam shapes, and change of side lobe level and beam width.

23. Method according to claim 22, characterized in that spatial beam shaping is performed by selecting a weighting function from among Cosin, Kai ser, Hamming, Hannig, Blackmann-Harris and Prolate Spheroidal according to the selected beam width for the main lobe.

24. Method according to claim 20, characterized in that the coordinates are defined by the position of the focusing point(s) of one or more camera(s) that film an event that takes place at specific location(s) within the target area.

25. Method according to claim 20, characterized in that the coordinates are defined by a user who controls a user interface comprising one or more displays showing an overview of the target area, a keyboard, a sound mixing unit, and one or more joysticks.

26. Method according to claim 20, characterized in that the coordinates are used to control and focus one or more cameras.

27. Method according to claim 17, characterized in that the dynamically selected audio output signal in a surround system is coherent with one or more camera(s).