WO2015032009A1

WO2015032009A1 - Small system and method for decoding audio signals into binaural audio signals

Info

Publication number: WO2015032009A1
Application number: PCT/CL2014/000043
Authority: WO
Inventors: Pablo RECABAL GUIRALDES; Cristián URRUTIA SOTO; Osvaldo TRAVIESO MANSO; Álvaro MUÑOZ NÚÑEZ
Original assignee: Recabal Guiraldes Pablo; Urrutia Soto Cristián; Travieso Manso Osvaldo; Muñoz Núñez Álvaro
Priority date: 2013-09-09
Filing date: 2014-09-09
Publication date: 2015-03-12

Abstract

The invention relates to a small system and method for binaural recording, which can record sound and decode it into a three-dimensional format, making available the reproduction thereof in three dimensions using reproduction devices such as headphones or conventional headphones, where said method and system offer a portable recording solution of preferably millimeter-size dimensions, that can be built into professional or household recording devices. The invention further relates to a computer program for binaural recording, and to a method for producing the recording system of the invention.

Description

METHOD AND REDUCED SIZE SYSTEM FOR THE DECODING OF AUDIO SIGNALS IN BINAURAL AUDIO SIGNALS DESCRIPTIVE MEMORY FIELD OF APPLICATION

The present invention consists of a system of reduced size for binaural recording and / or reproduction of binaural audio signals that allow the user to experience the three-dimensional sound experience. In addition, the invention also proposes an associated methodology to effect said recording and / or reproduction, a computer program associated with said methodology, and the manufacturing process of said system.

BACKGROUND OF THE INVENTION

The technology to record and / or reproduce video in three dimensions (3D) is an important innovation in the entertainment industry, successfully applied both on a large scale, such as in cinemas, as well as on a smaller scale, for example in systems portable and / or for home use. However, the recording and / or reproduction of three-dimensional sound has not developed as its video pair, which has resulted in the experience of three-dimensional sound experienced by users when listening to a recorded sound is not yet similar with respect to what the human being actually listens in natural circumstances.

In this context, the human being is able to identify the location of sounds around them very precisely (back, sides, up, down, near, far, etc.). The human auditory system perceives the sound differently in each ear according to each source specific sound and its location. For its part, the human brain is able to process these differences allowing you to identify the direction and distance of the origin of the sound.

Currently, the most common way to listen to recorded sound is through stereo speakers. There are several types of speakers, in terms of dimensions and principles of operation, but all are based on the transformation of electrical energy into mechanical energy and, finally, acoustic waves. Most of the speakers used today (in theaters, cars, shops, studios, headphones and home appliances) work with an electromagnet that moves a coil, which in turn moves a cone that generates acoustic waves.

On the other hand, the commercial standard currently used in most modern movie theaters and in the homes of the most demanding consumers is the surround system. This system defines the use of five or more channels with speakers generally organized in a horizontal plane, in addition to a low frequency channel with 10% of the sound intensity of the rest of the channels. Given the organization of the speakers, an enveloping (or surround) effect of the sound can be generated in the plane of the speakers, partially similar to a three-dimensional field of sounds, as long as the listener is in a central position in relation to the organization of the speakers.

The alternative that solves the practical, economic and technical deficiencies of the surround system is the simulation of the human auditory process that allows identifying the origin of sounds through the use of headphones. This technology is known as binaural reproduction, and is capable of providing a sensation of three-dimensional immersion, personalized for each listener. The reproduction of sound material in 3D has been the subject of research for several decades, both in the surround format and with binaural technology. In this context, researchers have made measurements with an artificial head for the simulation of sounds in the three-dimensional field, developing applications of robotic localization; and the personalization and parameterization of anthropometric models (also called structural models) that allow simulating 3D sounds for any physiognomy, among others. In this regard, it is well known that there are differences of time, phase and intensity for the same sound perceived in each eardrum, where also effects such as diffraction, refraction and absorption of sound waves by the torso, shoulders, head and the outside of the ear, modify the spectrum of sound that reaches the eardrums. These phenomena are those that naturally allow the human being to perceive the sound in three dimensions, being able to locate the position of a sound source around it.

In this context, the binaural recording of environmental sounds is currently performed with any of the following techniques:

• Recording with dummy head. It is achieved with a pair of microphones located inside the ears of the head of a model specially designed for these effects.

• Recording with binaural ear microphones. It uses microphones designed to be placed in or near the hearing channels of a human being, just like hearing aids.

• Recording with Otokinoko type microphones. It approximates the binaural effect produced by the human head, through a device that emulates the shape of some asymmetries that exist in human physiology. These techniques have various practical limitations and / or problems for reliable binaural reproduction. For the case of the recording with a dummy head, the technique is very little portable, since it requires the installation of a model of human scale (torso) in the place of the recording. In the case of binaural microphones that are inserted in the ears, it is a solution not integrated in terms of hardware and that does not provide consistency from an audio perspective when associated with the simultaneous capture of other media formats. In the case of Otokinoko type microphones, the simplified approach of the human anatomy is not enough by itself to achieve a faithful binaural reproduction and has scalability limitations to smaller recording devices. Finally, all these techniques are designed for a standard or average anatomy, so when reproduced presents a problem of generalization that will depend on the distance between the anatomy of the listener and the average human anatomy.

For the binaural synthesis of specific sounds, a mixing technique can be used that requires measuring the Head-Related Transfer Function (HRTF). This function is obtained by measuring the response in each ear against a pulse-type signal (usually at the outer end of the ear canal). The result characterizes the way in which the sound is perceived by the listener, since it contains implicitly the physiognomy of the listener. Due to the intrinsic ability of human beings to locate sounds, the idea of positioning a sound with a high degree of fidelity in the 3D field of the listener is plausible, when applying the HRTF to said sound.

Due to the direct relationship between the anatomy of the model object of the measurements and the HRTF, this function can vary considerably from person to person. The differences may increase depending on the relative location of the stimulus with respect to user, which has been studied in depth for changes in azimuth, elevation and distance. In this context, research has measured how a listener responds to a generic HRTF, observing that the response in the horizontal plane does not present large variations in different listeners, while the differences when using HRTF for variations in elevation affect a high rate of error for the location of sounds in the three-dimensional field.

Following the previous line you can find several studies related to HRTF samples for specific anatomies. In 2001, CIPIC Interface Laboratory of the University of California Davis delved into the sampling differences in 45 different subjects, publishing these results in "The CIPIC HRTF datábase", in Applications of Signal Processing to Audio and Acoustics, 2001 IEEE Workshop on the, pp. 99-102, 2001, by V. Algazi et al. In the analysis and modeling, it is possible to find developments such as the one published for robotic localization applications by C. Pinho et al. entitled "A Bayesian Binaural System for 3D Sound-Source Localization", in Cognitive Systems, (Karlsruhe, Germany), 2008. Likewise, it is also possible to find developments related to the personalization and parameterization of anthropometric models (also called structural models) that allow the simulation of 3D sounds for any physiognomy. Recent research proposes a transformation function for a particular ear and its relationship with the HRTF, which has been called PRTF (acronym in English for Pinna-Related Transfer Functiori). In particular, in the year 2011 M. Geronazzo et al. ("Customized 3d sound for innovative interaction design", in Proc. Italian ACM SigCHI Conf. On Computer-Human Interaction, (Alghero, Italy), pp. 1-3, 201 1) developed a customized structural model for an anatomy that explains the relationship that exists between the elevation of a sample and the radius of the head as a part of a structural model of the HRTF, and in particular the relationship that exists between the azimuth of a sample and the dimensions of the ear in function with the PRTF. Another study conducted by DJ Kistler et al. ("A model of head-related transfer functions based on principal components analysis and minimum-phase reconstruction", The Journal of the

Acoustical Society of America, vol. 91, no. 3, pp. 1637-1647, 1992) proposes a model of five base functions obtained from the Principal Component Analysis (PCA) to approximate any HRTF, demonstrating that the location between the real HRTF and the modeled one practically does not It varied.

On the other hand, US patent 8,265,284 discloses an apparatus for generating a binaural audio signal that includes a demultiplexer and decoder that receives audio information consisting of an M-channel audio signal that is a mixture of an audio signal of channel N and spatial parameter data to mix the audio signal of channel M with the audio signal of channel N. According to said document, a conversion processor converts spatial parameters of the information of spatial parameters into the first parameters binaurals in response to at least one binaural perceptual transfer function. Then, a matrix processor converts the M-channel audio signal into a first stereo signal in response to the first binaural parameters and a stereo filter generates the binaural audio signal by filtering the first stereo signal. The filter coefficients for the stereo filter are determined in response to at least one of the binaural perception transfer functions by a coefficient processor, wherein said transfer function is a HRTF. With respect to this, US 8,265,284 relates to the generation of binaural signal for content that has previously been mixed with spatial characteristics, such as a 5.1 surround mix of music or sound for images. Because it only uses parametrized HRTF of human auditory perception, it does not try to model the response of the acquisition system with which the quality of a 3D signal would be captured. Therefore, said system would be imprecise in the transformation of a variety of microphones with a characteristic spatial pattern in a binaural signal, since it does not take into account the structure of the components involved in the capture of sound waves as a relevant parameter. for audio processing with a transfer function.

As it is possible to note, the challenge of the present invention is to design a system and its methodology for binaural recording, wherein said system and method can be adapted to a professional or home-made 3D video camera, as well as to any other type of device. reduced size, and that resolves the current technological limitations described previously.

DESCRIPTION OF THE INVENTION

The present invention addresses the challenge of offering a recording system of reduced size, preferably of millimeter dimensions, for example capable of being used inside a video camera, smart phone or even smaller devices, where said system recovers waves of acoustically filtered sound that decodes in such a way that when it is reproduced by means of a conventional type of hearing aid or hearing aid device, it is heard exactly how an observer would experience it from the perspective of the scene. Additionally, the invention also discloses a recording method used by the system identified above for the processing of the audio signal, a computer program that applies said method and the manufacturing process of said system.

The specific objects of this invention are to provide a sound recording system having in an embodiment of the invention, an acoustic filter device that minimize the correlation in the response to the same sound originated from any pair of points with different spherical angle and a binaural processing decoder device, which takes the measured sound response within two acoustic filters, and rescue the information of the location of the sound, which is implicit in the audio signals, transforming it into a pair of signals with the characteristics that would have to be heard by the human being.

For a better understanding of the technology described in the present invention it is necessary to understand the essentials about listening in three dimensions and the simulation of this phenomenon by binaural recording.

For a specific sound source, the human auditory system perceives the sound differently in each ear according to the location of that source. Based on this, the brain uses a series of signals derived from the perception of sound to calculate this location, of which the most important are:

• Interaural time differences, given by the delay of the arrival of the sound wave to the ear furthest from the source;

• Interaural level differences, which correspond to the differences in intensity with which the sound wave reaches each ear;

• Interaural phase differences, given by the different phase of the sound wave in each ear; Y

• Differences in the sound spectrum, given by the absorption, resonance and / or diffraction of certain frequencies, generated by the physiognomy of the listener in function of the relative location of the source of the sound. In particular, the effects produced by the ears, head, neck and torso are very incidents in the type of differences.

All these characteristics can be represented in a vector (Xi, X _r ), in which Xj and X _r contain the input information of the signal, for example, the level, phase and spectrum representation of the sound, at a given moment , in the left and right ear respectively.

Part of the scientific literature to this day focuses on determining which of these signals have more or less importance in the location of different sounds, and in the study of other signals and processes involved in localization. Notwithstanding the foregoing, new methods have been created that allow the recording and reproduction of sound in an acceptably faithful manner to how we perceive it in reality (in 3D), which can be grouped in one of the following categories:

The first is the use of a physical model to make the recording of sounds, which interprets the sound around them in a similar way to what happens with human anatomy. In other words, a recording technique that is capable of detecting the four signals indicated above that the brain uses to locate sounds. In this category we can find the aforementioned dummy head method, the famous Jecklin disc, the microphones developed by the Japanese company Otokinoko, and the methods that use binaural microphones that are placed in the ears of the recording artist. In all cases, the device is used to make the recordings in situ, that is, the location of the sounds is given by the relative position of the head at the time of recording. It is possible to make the brain believe that the sound actually comes from the desired location, provided that the listener uses headphones located in a similar way to the microphones used. to make the recording with the apparatus, and naturally, that the characteristics of the model are as similar as possible to those of the average human physiognomy. In general, these methods involve locating the sound in exchange for the size and portability of the device, as well as having a problem of generalization in relation to the specific anatomy of each listener.

The second category groups the methods using a mathematical model of the dimensions of a particular head and digital sound processing to synthesize audio signals that the brain interprets as binaural. The best known of these models uses a HRTF, which is obtained for each head by measuring the response of microphones located inside the ear, when stimulated by sounds with all the existing frequencies in the human auditory range. These sounds are placed in different positions of the 3D space, in a place where there are no sound reflections that interfere with the duration of the HRTF (typically an anechoic chamber), where the recordings of the microphones are evaluated and stored for each location. By mathematical convolution of any sound with the response obtained for a specific location, it is possible to make the brain believe that the sound actually comes from the desired location, provided that the listener uses headphones located in a similar way to the microphones used to obtain the HRTF This method achieves a personalized and very precise result for the anatomy that is used when making the measurements (which in particular can be a dummy head), and therefore can result in low fidelity audio for a listener whose physiology differs from the used to make the measurements. In addition, it only serves to reproduce a finite number (and for practical reasons, small) of signals to be placed in the 3D field of the listener, thus excluding all continuous environmental sounds, such as the sound of the sea or the sound of the rain in the forest. In this context, the present invention seeks to resolve the limitations of the methods for both categories, introducing a new recording technology, fulfilling the following requirements: · It allows faithful recording and reproduction of sounds with a continuous origin in space (sounds environmental);

• It can be adapted to millimeter dimensions to be installed in small devices for both professionals and consumers;

· It is able to use a transform to map the sounds recorded by the system, in a representation of the sound with human form.

Then, in order to comply with the aforementioned requirements, the invention consists of developing a method and devices belonging to the first and second category, combining a physical system for recording binaural sounds with a methodology applied to estimate a Binaural Transformation. (BF) of the signals captured.

First, the invention consists of developing a method and devices belonging to the first category, that is, a physical method that minimizes the correlation in response to the same sound originating from any pair of points, with a different spherical angle. Said method and devices are capable of recovering all the information that allows the human being to locate sounds in space, since it minimizes the correlation in the response of the system as a result of equal signals emitted from different spherical locations that help to preserve the information related to the location of the sound. For the foregoing, one embodiment of the system of the invention proposes an acoustic filter device with materiality and parameterized specifications for each spherical angle of a defined discretization or sampling grid. According to the method of the invention, this device, which is called Acoustic Filter of Angular Parametrization (APAF, acronym in English for Angular Parameterization Acoustic Filter), is applied to a pair of sound capturing devices, for example, microphones of dimensions according to the specifications, to measure more thoroughly the transfer function related to this system, obtained based on its input and output signals. Subsequently, and based on the second category of methods for binaural recording, an algorithm of decoding the information captured by the microphones-filter system (MFS, acronym in English for Microphones-Filter System) or acquisition system is developed. through a combination of existing algorithms of artificial intelligence for the approximation of functions, and thus achieve the function of desired transformation. It is expected that this transformation function takes a part of the audio captured by the acquisition system and converts it into a signal as it would be heard by a normal human head. In order to obtain this transformation function, a d mmy or model head is used and a calculation is made to obtain the HRTF of this head. Equivalent measurements are made for the acquisition, modeling and obtaining system of an MFS Transfer Function (MFSTF, for the Microphones-Filter System Transfer Functiorí). From the information of the HRTF and MFSTF, equal parameters are learned such as the typical binaural signals that are then used in the transformation function to approach the binaural audio. In this sense, the desired transformation function is obtained by means of a training and validation process carried out with pairs of results of each transfer function (HRTF and MFSTF) and their parameters calculated for the sounds emitted from the same angular location. relative. Then, the process approximates the MFSTF to the HRTF of said system, obtaining the Binaural Transformation (BT, acronym in English for Binaural

Transformatiori) that converts the acquired signal into a binaural reproduction signal. Thus, the system and method make up an integral design that is capable of recording audio and then processing it for reproduction as a sound environment in three dimensions.

BRIEF DESCRIPTION OF THE FIGURES

The nature of the invention will be better understood from the following detailed description of several specific embodiments, given only by way of example, with reference to the accompanying drawings, in which:

Figure 1 shows a listening pattern of a specific sound emitted from the source point S, where Xi is the input signal received by the left ear and X _r is the input signal received by the right ear.

Figure 2 shows a block diagram of the processing performed on the audio signals, which is the basis of the decoding process to find the binaural representation.

Figure 3 shows a schematic view of one of the preferred embodiments of the invention.

DETAILED DESCRIPTION The present invention describes a method and binaural recording system, capable of recording sound and decoding its spatial characteristics, which when reproduced with hearing aid-type devices offers a three-dimensional representation of the recorded sound scene. Said system and method offer a solution of reduced size, preferably of millimeter dimensions, which can be applied and / or integrated into professional, domestic, portable devices such as cell phones, among others.

In one embodiment of the invention said recording system consists of at least two APAF acoustic filter units, where each filter unit has a sound sensor unit in its interior. For example, a microphone that transforms the acoustic signal or sound wave into an electrical signal or an audio signal. The arrangement of the APAF units is known as an acoustic filter device or APAF device. Each APAF unit, which are physically separated in one mode, minimizes the angular (spherical) correlation of the response to any pair of identical sounds with frequencies in the human range, measured from the sound sensor unit that is located within the Each unit of acoustic filter. Preferably, said sound sensor unit consists of a high gain omnidirectional microphone in proportion to its size. In this context, in a preferred embodiment of the invention, the acoustic filter units and sound sensors used are of millimeter dimensions, which facilitates their integration into existing recording devices such as professional, portable or domestic appliances. In this way, the present invention considers that the frequency curve of the sound sensor unit used is not excessively different from that of the conventional microphones used in the recording studios, so any non-linearity in the response may be corrected by later equalization stages. The construction of an APAF device comprises materials commonly used in the production of video cameras, smart phones and their accessories, microphones, acoustic absorption and acoustic resonance, the objective being to obtain the minimum angular correlation for a pair of identical sounds emitted from sources located in different angular positions, measured based on an average index of the result obtained for each pair of sounds. Then, the main objective of the acoustic filter device is to preserve the location information of the sound contained in an audio signal, which provides the effects of the variation of the sound spectrum and the variation of the sound level of the received sound wave. Also, due to the spatial separation of the sound sensor units, there is a variation in the time in which the sound events are acquired by each sound sensor, and therefore said time difference is also contained in the signals of audio as information.

To achieve the variation of the sound spectrum, the APAF device offers:

• A variable density system, which varies parametrically angular (elevation and azimuth);

• Cancellation / enabling of certain frequencies, which is achieved through different length channels together with which the sound travels before reaching the microphone

(similar to how a directional microphone works, that is, achieving cancellation of sound from certain directions through phase cancellation). The level variation is achieved naturally by the spatial separation of both microphones, and due to the absorption / dissipation experienced by the sound as it passes through the device. An APAF device with its APAF units that has been coupled to a sound sensor unit or microphone, consists of a subsystem called the microphones-filter system (MFS) or acquisition system, which is one of the central axes for the recording of sound of the present invention. For the sound recorded by the MFS to be recorded and / or reproduced binaurally, a decoder device is used that translates or transforms the signal recovered by the sound capture device into a three-dimensional signal like that which the human ear hears. For this purpose, the decoding device consists of means for storing calculation and processing information, such as acoustic measurements made to the MFS and a dummy head system, by applying a transformation to the MFS output signal to obtain a binaural reproduction signal in a type of hearing aid device or conventional hearing aids.

In this context, the acoustic measurements made to both the MFS and the dummy head to determine the transformation function are made based on a sound sample that contains all the frequencies that the human being can hear, where said sample can be of the type white noise, impulses or sinusoidal sweep.

Once the sound sample has been determined with which both the MFS and the dummy head will be measured, a sampling grid is defined, which may correspond to a set of characteristics similar to those used by VR Algazi et al. ("The CIPIC HRTF datábase" In Proc. 2001 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA 2001), New Paltz, NY, USA, October 2001), that is, N = 1250 sound emitting points, which is considered a standard in the academic world for HRTF measurements. However, any type of grid suitable for this type of measurements can be used.

Subsequently, the samples are taken with the dummy head for the determined N locations and with the same location N to the MFS or acquisition system, where the samples taken are analyzed for correlation between signals and iteration. In this sense, the experimental approach used is generating pulses (or pulse type signals such as a "sinusoidal sweep" or a maximum length sequence "MLS", for Maximum Sequence) based on a semi-uniform elevation agreement. and rotations in the horizontal plane (azimuth) over the center of a sphere. That is, the signals emitted from the spherical arrangement are measured for both the dummy head and the binaural acquisition system. Then, the dummy head measurements help to obtain the HRTF corresponding to the dummy head and the descriptive parameters of it. The measurements equivalent to the acquisition system are made for the modeling and obtaining of the transfer function of said system (MFSTF), and the descriptive parameters of it. Finally, both transfer functions are correlated with each other, in which a supervised learning method is used to obtain the transformation function that translates this correlation.

In this context, the transformation function of the decoding device is obtained from the acoustic measurements taken from the dummy head (HRTF) and from the transfer function of the acquisition system (MFSTF) in conjunction with the intelligence algorithm artificial that approximates the function that maps both responses, through its descriptive parameters.

As indicated above, the transformation function in the decoding device is an approximation function obtained by supervised learning (or an equivalent machine learning technique), in which the input signal to the learning method may come from one embodiment of the invention of the microphones of the system, representing the sound waves X | and X _r as shown in Figure 1 for the case of two sensors, and the output signal is represented by Y) and Y _r .

Said artificial intelligence system comprises a programmed learning algorithm in computer systems or software, where said learning algorithm is parameterized to evaluate the best combination of parameters in the validation stage. In this sense, the steps of the algorithm can be summarized as follows:

1. The acquisition of the input signals of the acquisition system, in which said signals are treated as a vector of the type Xi, X _r (or Xi, X ₂ , ... XN if several microphones are used as a mode of the acquisition system).

2. The analysis and separation of the input signal segments that describe the relevant sound events.

3. The pre-processing of the segments and the obtaining of descriptive parameters of directionality. Directionality descriptors provide estimated information about the possible origins of the signal and allow the use of this information for the next steps, in which each output signal (left-right) is synthesized using one or more transformations that are in the validation and training stage, which minimizes the margin of error. 4. The convolution of the segments with the transfer functions determined according to step 2. Preferably, this convolution is in the time domain, although a product point of the vector can also be used in the frequency domain, or an operation equivalent in another domain. In order for the system to work in real time, a convolution method of superpose-add type can be used so that this process does not represent a noticeable delay for the user.

5. The output is the reconstruction of the separated audio segments in the original points.

Since the software of the invention is integrated in the system of small size, it is necessary to perform the previous steps in a microprocessor to have an independent solution, with a digital analog converter of high frequency sampling and a digital analogue output converter. On the other hand, the previous steps can be incorporated into the memory of the devices that house them, for example, portable devices that already have a framework to communicate with their processor, memory, analogue digital converters, data buses, etc.

As described above, the validation and training of the artificial intelligence system consists of establishing a mathematical correlation between the functions of transfer of the signals captured by the dummy head (generic HRTF) and the transfer functions of the signals captured with MFS (MFSTF). To establish this correlation, mathematical indexes are applied that allow the evaluation of the decoding performance, in terms of the correlation obtained and the sensation of immersion in the listener.

Then, the best combination of parameters for the transformation is established, obtaining the best transformation function that converts the output signal of the MFS into a binaural reproduction signal. This transformation or transformation function is called Binaural Transformation (BF).

In this context, as for example only for the frequency domain, in the validation and training step it is intended that, for each location Θ in the measurement grid, the following equality is true:

¾_MSF * BF ₀ - H ₀ _dummy

Where:

¾_MSF corresponds to the parameterized transfer function of the acquisition system, for the position Θ, which will be correlated with the transfer function of the dummy head.

BF ₀ corresponds to the Binaural Transformation described above.

¾_dummy is the HRTF parameterized for the dummy head, for position Θ.

That is, the validation and training stage is carried out in order to find the value for B ₀ to achieve the equality of the equation previously described for all the angular positions. For example, in a modality where the acquisition system has two microphones, the parameterized transfer functions of said system could be represented as coefficients that describe the Interaural Time Difference (ITD) for the left microphones. and right. This parameterized transfer function can be easily approximated to an HRTF that has been parameterized in the same way, in which case the transformation function would be a monotonous function that maps the ITDs captured by the acquisition system to the corresponding ITDs that occur in a human head The audio signals picked up by the acquisition system, or in another modality a filtered version of these, can be fed to the transformation function, which would generate an approximate binaural representation of the input audio signal.

In the case of multiple microphones, the output can be written as a linear combination of the inputs and transfer functions that must be estimated in the training and validation stage, using for example the following equation:

Yi = Ci DXi DH, + C ₂ DX ₂ nH ₂ + ... + CN DXN Ü HN where Xi is the i-th input signal H i is the i-th associated parameterized function and C i is the parameter of weighting of i-th. In this sense, each weighting parameter is related to each input signal and directly related to the correlation with the HRTF function. Figure 2 shows a generalization of the methodology described above to obtain a binaural audio signal according to the invention.

Based on the foregoing, according to the embodiment of Figure 3, the binaural recording system (1) object of the present invention describes an operation methodology consisting of in recording the sound using at least a pair of sound sensor devices (2) or microphones, wherein the sound sensor devices are wrapped in an acoustic filter device or APAF (3), comprising the connection of said units the system of microphones-filter or acquisition system (4). Additionally, the recording stage comprises the standard digital audio recording processes, that is, pre-amplification, anti-wing filter, sampling, analog-digital conversion, decoding and storage, among others.

Subsequently, the output audio signal of the acquisition system is processed in a decoder device (5) belonging to the binaural recording system, which applies the Binaural Transformation (BF) to said signal and, preferably, stores it in at least one storage unit, converting it into a pair of signals capable of being understood by a human who listens to them by means of conventional type hearing aids or hearing aids (6) and, in an alternative modality, by means of conventional stereo speakers. In fact, although by the use of conventional stereo speakers it is not possible to obtain the desired binaural effect, the method and system of the invention allow to improve the stereophonic depth of the audio signals in the conventional speakers, thus improving the sound image and the Surround experience of different conventional audio systems. The Binaural Transformation (BF) applied by the decoding device is obtained from the analysis of the transfer functions for the acquisition system and for a dummy head, as indicated in the preceding paragraphs.

In one embodiment of the invention, the acquisition system can be replaced by a plurality of spatially separated sound sensor devices or microphones, which together are used to minimize the angular (spherical) correlation of the response to any pair of sounds identical with frequencies in the human range. Then, a system transfer function composed of a plurality of microphones with the aim of correlating said function to the associated parameterized HRTF, obtaining the Binaural Transformation (BF) that applies the decoding device of the signal to convert it into a binaural listening signal that allows recreating the real three-dimensional environment with respect to the capture of sounds by the human ear. Then, a difference of time, phase and intensity that occurs in the plurality of arrangement of sound sensor devices provides enough information to determine the position of a sound source from the Binaural Transformation (BF), so the system of acquisition of the invention can be replaced by said arrangement. In this context, due to the plurality of spatially separated sound sensor devices or microphones, it is understood that the system is composed of 3 or more of said devices.

In another embodiment of the invention, the acquisition system, ie the APAF device enclosing the sound sensor units, is integrated into a portable device such as a video camera, photographic camera, smart phone, tablet and / or smart watch, or any other type of device for binaural recording, wherein said set forms a new acquisition system. In this context, the binaural transformation applied by the decoding device is adapted to the geometric or structural configuration formed by the new acquisition system consisting of a portable device that integrates in its structure at least two microphones, spatially separated, wherein said binaural transformation allows the sound wave captured by the acquisition system to be captured recreating the real three-dimensional environment with respect to the perception of sounds by the human ear. In this scenario, there is a time difference and a difference in sound intensity that, in conjunction with the physical object (the same device) that separates both microphones, which in this case acts as an acoustic filter device or device APAF, allows a difference in the frequency content (sound spectrum) that each microphone captures for a given sound. The sound spectrum is further altered by the support of the device, for example the user holding it; Your torso, head, arms and hands will affect the sound depending on which way you hold the device, either in a vertical or horizontal position. This difference in spectrum is also taken into account in the methodology for the decoding and processing of binaural sound. In this way, the decoding device has all the necessary information to decode the spatiality information that is already present in these two microphones and transform it into a human binaural signal. Additionally, the methodology of the invention can be applied to existing devices as long as they already have at least two integrated microphones, so that after obtaining the transfer function of said device-microphones system, or MFSTF considering the device as a filter, and To approximate it to the associated HRTF, we can obtain the transformation function that must be applied by the decoding device that converts the input signal into a binaural signal to listen in three dimensions. In this embodiment, the acquisition system could be part of the common components of a portable device that consists of at least two sound sensors or microphones, in which the structure of the portable device and its support act as the acoustic filter device enveloping at least two of the sound sensors or microphones mentioned. With respect to the decoding device, in a preferred embodiment of the invention, said device is integrated into the binaural recording system and, alternatively, into the reproduction system, wherein in a convenient embodiment, said decoding device can be implemented in a program computer previously included in a device or that can be installed in the storage memory of the same. Said program includes the implementation of the decoding algorithm based on the learning algorithm that allows obtaining the transfer function of the acquisition system or MFSTF, correlated with the generic HRTF, with the aim of converting the output signal of the acquisition system into a binaural reproduction signal. This objective is fulfilled when obtaining a Binaural Transformation (BF) that allows the signal of a specific acquisition system (microphones-filter, microphones-device, plurality of microphones, among others) to be converted to a stereo binaural signal to listen through of conventional hearing aids or hearing aid type sound reproduction devices.

Then, the computer program for the binaural recording and, alternatively, its reproduction, could consist of information storage means to store the information coming from the sound waves captured by a reception device in at least one storage unit, processing means of the information to obtain the relevant parameters of the stored information, means of comparison to correlate the parameters of the stored information with a parameterized HRTF, previously stored in at least one storage unit, means of information processing to obtain the Binaural Transformation (BF) and apply it to the stored information, storage means to save the binaural transformation of the stored information for its later recovery and reproduction, if necessary.

Claims

1. System of reduced size for the binaural recording that includes:

- an acquisition system for recovering sound waves acoustically filtered from the environment and converting them into audio signals;

- a decoding device that receives the audio signals from the acquisition system and that converts them into a binaural signal that recreates a three-dimensional listening environment for the user.

2. The reduced-size system for binaural recording of clause 1, wherein the acquisition system comprises at least two spatially separated sound sensors or microphones, located within an acoustic filter device, where the physical arrangement of the acoustic filter and the microphones are called Microphone-Filter System (MFS).

3. The reduced-size system for binaural recording of clause 1, wherein the acquisition system comprises a plurality of spatially separated sound sensors or microphones.

4. The reduced-size system for binaural recording of clause 1, wherein the acquisition system comprises at least two sound sensors or microphones incorporated in a portable device or any other device for recording and / or reproduction, where the structure of the portable device and its support act as an acoustic filter device.

5. The reduced-size system for binaural recording of clause 1, wherein the decoding device comprises means for storing, calculating and processing information for obtain a binaural transformation function that approximates a parametrized transfer function of the acquisition system (MFSTF) to a parametrized generic head-related transfer function (HRTF).

6. The reduced-size system for binaural recording of clause 5, wherein the decoding device further comprises means for storing, calculating and processing information to apply the binaural transformation function to the received audio signal, transforming it into the binaural signal.

7. The reduced-size system for binaural recording of clause 1, wherein the system further comprises an atrial-type playback device, used for the reproduction of the binaural signal.

8. The reduced-size system for binaural recording of clause 1, wherein the recording system further comprises conventional stereo speakers, used for the reproduction of the binaural signal.

9. System of reduced size for binaural recording that consists of:

- an acquisition system for recovering sound waves acoustically filtered from the environment and converting them into audio signals, wherein the acquisition system comprises at least two spatially separated sound sensors or microphones, located inside a recording device. acoustic filter, where the physical arrangement of the acoustic filter and the microphones is called Microphone Filter System (MFS);

- a decoder device that receives audio signals from the acquisition system and converts them into a binaural signal that recreates a listening environment three-dimensional for the user, wherein the decoding device comprises means for storing, calculating and processing information to obtain the binaural transfer function that approximates a parametrized transfer function of the acquisition system (MFSTF) to a parametrized generic HRTF and to apply said binaural transfer function to the received audio signal, transforming it into the binaural signal.

10. The reduced-size system for binaural recording of clause 9, wherein the acoustic filter device comprises at least two acoustic filter units, each one enclosing each of the at least two spatially separated sensor devices or microphones.

11. The reduced-size system for binaural recording of clause 10, wherein the acoustic filter units are physically separated.

12. The reduced-size system for binaural recording of clause 9, where the microphones are omnidirectional and high gain in proportion to their size.

13. The system of reduced size for binaural recording of clause 9, where the acquisition system is of millimeter dimensions, facilitating its integration to existing recording devices.

14. The reduced-size system for binaural recording of clause 9, wherein the acoustic filter device is constructed of a variable density, which varies angularly, parametrically and with different length channels along which the sound travels before to reach the microphone, canceling and / or boosting certain frequencies.

15. The reduced-size system for binaural recording of clause 9, wherein the acquisition system is part of a portable device, the structure of the portable device and its support, which acts as the acoustic filter device.

16. The reduced-size system for binaural recording of clause 9, wherein the recording system further comprises a hearing aid-type playback device, used for the reproduction of the binaural signal.

17. The reduced-size system for binaural recording of clause 9, wherein the recording system further comprises conventional stereo speakers, used for the reproduction of the binaural signal.

18. Small-sized system for binaural recording consisting of:

an acquisition system for recovering sound waves acoustically filtered from the environment and converting them into audio signals, wherein the acquisition system comprises at least two spatially separated sound sensors or microphones, located inside a filter device acoustic, where the physical arrangement of the acoustic filter and the microphones is called Microphone-Filter System (MFS);

a decoder device that receives the audio signals from the acquisition system and that converts them into a binaural signal that recreates a three-dimensional listening environment for the user, wherein the decoding device consists of means to store, calculate and process information to obtain the binaural transfer function that approximates a parametrized transfer function of the acquisition system (MFSTF) to a parametrized generic HRTF and to apply said binaural transfer function to the received audio signal, transforming it into the binaural signal; wherein the acquisition system is part of the common components of a portable device consisting of at least two sound sensors or microphones, wherein the structure of the portable device acts as the acoustic filter device involving at least two of the aforementioned sensors of sound or microphones; Y

where the decoding device is implemented in a computer program previously included in the portable device or that can be installed in its storage memory.

19. The reduced-size system for binaural recording of clause 18, wherein the recording system further comprises a hearing aid-type playback device, used for the reproduction of the binaural signal.

20. The reduced-size system for binaural recording of clause 18, wherein the recording system further comprises conventional stereo speakers, used for the reproduction of the binaural signal.

21. The reduced-size system for binaural recording of clause 18, wherein the acoustic filter device further comprises the support that supports the structure of the portable device, which in one embodiment is the user holding said device.

22. Binaural recording method consisting of the following stages:

recover sound waves acoustically filtered from the environment by an acquisition system, converting them into audio signals;

transmitting the audio signals from the acquisition system to a decoding device; process the audio signals in a decoding device, converting them into binaural signals.

23. The binaural recording method of clause 22, wherein the step of processing the audio signals comprises:

receive the audio signal from the acquisition system;

obtain the parameterized transfer function of said system (MFSTF);

correlating said parameterized transfer function with the parameterized generic HRTF; Y

get the binaural transformation.

24. The binaural recording method of clause 22, wherein the step of recovering sounds acoustically filtered by the acquisition system comprises minimizing the angular (spherical) correlation of the response to any pair of identical sounds with frequencies in the human range , coming from sources located in different angular positions.

25. The binaural recording and reproduction method of clause 22, wherein the step of recovering the sounds acoustically filtered by the acquisition system comprises:

to preserve the information of the location of the sound contained in the audio signal, which provides the effects of the variation of the sound spectrum and the variation of the sound level of the recovered sound wave; Y

keep the variation in the time information in which the sound events are acquired by the acquisition system.

26. The binaural recording method of clause 23, wherein the step of processing the audio signals further comprises applying the binaural transformation to the received signal, generating the binaural signal.

27. The binaural recording and reproduction method of clause 25, wherein the conserved information considers the geometric configuration of the acquisition system in conjunction with the alterations caused by the user that is holding the acquisition system, in which both characteristics allow the difference in frequency content (sound spectrum) and the difference in time that the acquisition system recovers for a certain sound.

28. The binaural recording method of clause 27, characterized in that it is implemented in a portable device, such as in a smart phone or the like.

29. The binaural recording method that consists of:

- recover acoustic sound waves from the environment by means of an acquisition system and convert them into audio signals, in which said recovery consists of,

- minimize the angular (spherical) correlation of the response to any pair of identical sounds with frequencies in the human range, coming from sources placed in different angular positions;

- preserve the information of the location of the sound contained in the audio signal, providing the effects of the variation of the sound spectrum and the variation of the sound level of the recovered sound wave; Y

or keep the variation in the time information in which the sound events are acquired by the acquisition system; transmitting the audio signals from the acquisition system to a decoding device;

processing the audio signals in a decoding device, converting them into binaural signals, wherein said processing consists of

- reception of the acquisition system signal;

- obtaining the parameterized transfer function of said system;

- the correlation of said parameterized transfer function with the generic HRTF;

- obtaining the binaural transformation; Y

- the application of the binaural transformation to the received signal, generating the binaural signal;

30. The binaural recording and reproduction method of clause 29, wherein the conserved information considers the geometric configuration of the acquisition system in conjunction with the alterations caused by the user that is holding the acquisition system, in which both characteristics allow the difference in frequency content (sound spectrum) and the difference in time that the acquisition system receives for a particular sound.

31. The binaural recording method of clause 30, characterized in that it is implemented in a portable device, such as in a smart phone or the like.

32. A computer program for binaural recording that includes:

information storage means for storing information from acoustic signals or sound waves recovered by a pick-up device or acquisition system in at least one storage unit; information processing means to obtain from the transfer function of the stored information;

comparison means for correlating the parameterized transfer function of the stored information to a generic parametrized HRTF previously stored in at least one storage unit;

information processing means to obtain the binaural transform and apply it to the stored information;

Storage means to store binaural transformed stored information for later rescue and reproduction.

33. A computational program for binaural recording, comprising the implementation of the method of claims 22 or 29 in a portable device.

34. Manufacturing process of a system of reduced size for binaural recording comprising the steps of:

provide an acquisition system that minimizes the angular correlation of the response to any pair of identical sounds;

define a type of sample to be considered to measure the impulse response, covering all the desired frequencies that the user can listen to;

- measure the transfer function related to the procurement system (MFSFT), using the defined sample;

measure the transfer function related to a dummy head (HRTF), using the defined sample; develop an algorithm for decoding information retrieved by the acquisition system, selecting the transformation function that best approximates the function that maps the responses to both parameterized transfer functions;

train and validate the selected transformation function, establishing a mathematical correlation between the transformations of the signals captured by the dummy head and by the acquisition system;

establish mathematical indexes that allow evaluating the performance of the decoding in terms of the correlation achieved and the sensation of immersion of the listener;

select the binaural transformation to be applied in a decoding device.

35. Method of manufacturing a small-sized system for binaural recording according to claim 34, wherein the steps of measuring the transfer function related to the acquisition system (MFSFT) and the transfer function related to a head dummy (HRTF) comprise the sampling with the dummy head for determined N locations and with the same N locations to the acquisition system, where a correlation analysis is made between the signals and the iteration to these samples.

36. Method of manufacturing a small-sized system for binaural recording according to claim 35, wherein the recording system is integrated and / or is part of a portable device, such as a smartphone.