WO2010086321A1

WO2010086321A1 - Binaural audio guide

Info

Publication number: WO2010086321A1
Application number: PCT/EP2010/050912
Authority: WO
Inventors: Ivan Portas Arrondo
Original assignee: Auralia Emotive Media Systems, S.L.
Priority date: 2009-01-28
Filing date: 2010-01-27
Publication date: 2010-08-05
Also published as: EP2214425A1

Abstract

Binaural audio guide (1), preferably for use in museums, which provides users (20) with information about the objects (17) around them, in such a manner that the information provided seems to come from the specific objects (17) relative to which it informs.

Description

BINAURAL AUDIO GUIDE

OBJECT OF THE INVENTION

The main object of this invention is a binaural audio guide that is noninvasive with the environment, which provides users with information on the objects around them through headphones. The acoustic information relative to each object seems to come from the object itself and therefore the information received via the headphones depends on the position and orientation of the user's head. This audio guide is especially useful in museums or similar places.

BACKGROUND OF THE INVENTION

Some audio guides that offer tourist information to users based on their location are known in the state of the art, and which, according to the type thereof, can operate outdoors (for example, in a city) or indoors (for example, in a museum). All of these systems comprise a device transported by the user and a network of information points located next to each object relative to which information is requested, whether a monument in the case of the outdoor type or a work of art in the case of the indoor type. The information points detect whether the user is in the vicinity of the object and, if so, reproduce an audio file relative to the monument or painting in question. The audio files may be stored in a device transported by the users or at each information point, in such a manner that they are transmitted to the user when reproduced.

An inconvenience of these systems is their high cost, as they require the prior installation of the information points so the system can operate. Additionally, some environments such as museums and similar do not allow the installation of information points for artwork security reasons or for fear of endangering the surroundings. Another inconvenience is related to the user's impossibility to intuitively identify the painting or monument relative to which they are receiving the information, due to the lack of context thereof. For example, in the case of the museum, there could be several works or art within the same room, which can create confusion in the user.

Confusion can also arise due to the lack of precision in the localization of the user. Normally, the user is only located based on the scope of the means of communication between the device transported by the user and each information point. For example, infrared media are used indoors to ensure that the user is inside a specific room, or media such as Bluetooht or W-Fi which have a known range. However, this provides very little precision, which frequently leads to the reproduction of audio files at times when the user is not in the required position. In addition, due to the type of Boolean detection (true or false), these systems do not solve problems such as occlusion due to physical obstacles between the issuer and the recipient, abrupt audio reproduction, etc.

DESCRIPTION OF THE INVENTION

Humans are volumetric sonorous recipients, i.e. we process the sound that reaches us through, for example, reflections created by the shoulders and torso, or diffractions created by the sound as it surrounds the user's head. That is, human hearing is binaural by nature, where the resulting sonorous reception process ends in two single channels: right ear and left ear. The term "binaural" relates to the nature of human hearing, due to the fact that people are capable of capturing all sonorous spatial information through a single pair of ears and determining the location of the sonorous sources. When this phenomenology is not taken into account the so-called "endocranial sound" is normally produced, such as for example on listening to traditional stereo sound through headphones.

Endocranial sound consists of the feeling that the sonorous sources are located inside the user's head, at a point located between the headphones, due to which traditional stereo sound is not a recommendable format when trying to represent three-dimensional sonorous spaces realistically. The invention describes an audio guide system where the audio files that provide tourist information to the user are transformed to binaural format in real time depending on the position and orientation of the user's head in relation to the position of the object relative to which information is being provided. That is, for the user the sound source seems at all times to be the object relative to which they are being informed. If the position or orientation of the user's head changes, the sound source is also displaced in real time, in such a manner that the sound continues to seem to come the object in question. In this manner, the user instantly and intuitively knows which is the object relative to which they are receiving information. That is, in this document, the expression "reproduce a binaural file associated to an object" means that the binaural file being reproduced is such that the apparent sound source is located on said object at all times.

In orden to generate the binaural audio files, localization and orientation means are used that determine the position and orientation of the user's head without the need for installing information points, in such a manner that the binaural audio guide of the invention can be used without altering the environment. This localization and orientation means obtains the location and orientation of the user's head through the detection of reference points in the environment by means of at least one image acquisition means in accordance with a Mono SLAM process (Simultaneous Localisation And Mapping process using one camera). The

MonoSLAM system is superficially described in "MonoSLAM: SLAM in real time with a single camera," by Andrew J. Davbison, Ian D. Reid, Nicholas D. Molton an Olivier Stasse, Analysis of trans. Patterns and artificial intelligence, IEEE 2007. Fundamentally, the MonoSLAM system obtains the position an orientation of an image acquisition means based on the detection of fixed reference points and on the modification of its relative position as the position and orientation of said image acquisition means changes. For example, in the case of the present invention, the reference points may be the corner of a painting, a smoke detector, the corner of a room, etc. Based don these data, the MonoSLAM system obtains the exact position and orientation coordinates (X, Y, Z) of the image acquisition means, and therefore of the user's head.

MonoSLAM's great accuracy allows a set of reproduction rules to be established, spatial or temporal, within a rules grammar, that determine how the audio files associated to each object will be reproduced based on user conduct. In the case of spatial rules, regions that trigger the reproduction of certain audio files, for example regions around objects, door thresholds, etc., could be defined. The reproduction volume could even be modified based on the user's position. In the case of temporal rules, time intervals that trigger the reproduction of certain audio files could be defined. For example, it could be the duration of the user's stay within the region associated to a specific object, the time they spend looking in a specific direction, etc.

In the present document, we will consider front direction as being the direction located immediately in front of the user's face, i.e. the direction in which the user looks if the continue looking straight ahead, as opposed to the rear direction or side directions, all being comprised within a horizontal plane that crosses the user's head. Consequently, for example a font-upper or side-upper direction, in the present document comprises those directions that, starting from the front or side direction, are raises to an angle of between 0^Q and 90^Q in a vertical direction.

In this manner, a first aspect of the present invention describes a binaural audio guide which provides information to a user relative to objects located within an enclosure, for example some works or art inside the rooms of a museum.

In accordance with the invention, each object is associated to at least one audio files and reproduction rules.

The reproduction rules associated to each object are individually chosen according to each specific application. For example, and in the case of information relative to works of art inside a museum, greater regions could be assigned to the main works and smaller regions to the secondary works. Or, in the case of paintings, the associated regions could be located only in front of them, while the regions associated to sculptures could also include the space located behind them.

Depending on the design of the reproduction rules, the combination of several sound files corresponding to one or more objects is possible. The reproduction rules are not only based on the position of the user and the area in which he is, but also on the amount of time that he/she is within said area.

On the other hand, the information contained in the audio files associated to each object can be of different kinds. For example, in the case of a museum, it could be a locution with artistic, historical, tourist or similar information, or atmosphere sound, dialogues or music that help the visitor to understand the work of art.

The elements that comprise the audio guide are fundamentally an upper module and a console, each of which are described below: a) Upper module

It is a module that can be coupled to the user's head, and which comprises headphones and at least one image acquisition means to acquire images of the user's surroundings.

The function of the headphones is to reproduce binaural audio files, and therefore can be of any type provided that they can carry out said objective.

On the other hand, the images acquired by, at least, one image acquisition means will be used to determine the position an orientation of the user's head based on a group of reference points detected by the enclosure walls and /or ceiling. To this end, the image acquisition means must be fixed to the upper module, and at the same time ahead of the user, in such a manner that no relative displacements take place between the image acquisition module and the user's head. With respect to the orientation, in principle the image acquisition means could be oriented in any direction provided that there are fixed reference points within their viewing angle, even though in a specific embodiment of the invention it is oriented in a front-upper direction. For the purpose of preventing the image acquisition means from suffering damage, it is preferably disposed inside a transparent dome located in the upper module, which also facilitates the cleaning thereof. Normally, the image acquisition means is a CCD or CMOS-type camera, although any type of camera can be used. b) A console

It is a device connected to the upper module and which also comprises:

A localization an orientation means that serves to determine the position an orientation of the user's head based on the images acquired by at least one image acquisition means. To his end, MonoSLAM-type algorithms are used, capable of calculating the position and orientation of an image acquisition means at great speed based on the images that is obtains.

In a preferred embodiment of the invention, an additional localization and orientation means based on accelerometers and gyroscopes to improve system accuracy is also used. - A storage means that stores audio files associated to the objects relative to which we wish to inform the user.

A sonorous synthesis means that generates, based on the position and orientation of the user's head, time and reproduction rules associates to the objects, a binaural audio file based on at least one audio file associated to each of said objects. That is, the binaural audio file can be generated from the combination of several audio files associated to several objects. In that audio file generated, the sound corresponding to each of the objects will seem to come form the object in question.

The localization and orientation means, the sonorous synthesis means and the storage means can be implemented, jointly or individually, using any types of devices known in the state of the art, such as micro-controllers, microprocessors, FPGAs, DSPs, ASICs or similar, provided that these are capable of implementing the necessary operations for the correct operation of said means. Additionally, in preferred embodiments of the invention the binaural audio guide also comprises a removable battery, to provide it with autonomy, and an interface medium to allow users to interact with the audio guide, in order to stop or repeat the reproduction of the binaural audio files.

According to a second aspect of the invention, a description is given of an operating procedure of a binaural audio guide such as that previously described, where each object is associated to at least one audio file and reproduction rules, and which comprises the following operations:

1 ) Acquire, through an image acquisition means coupled to a user's head, images of said user's surroundings. 2) Process the images acquired in a localization and orientation means through a MonoSLAM process to obtain the position and orientation of the user's head.

3) Determine, based on the position and orientation of the user's head and the time, if any of the reproduction rules corresponding to any of the objects are fulfilled.

4) If so generate, by means of a sonorous synthesis module, and based on the position and orientation of the user's head, on time and the reproduction rules of said objects, a binaural audio from at least one audio file associated to each object in such aa manner that the user's feeling on listening to the binaural audio file is that each object comes form said object in question.

5) Reproduce the binaural audio file through the user's headphones.

In view of the fact that the generation and reproduction of the binaural audio files is carried out in real time, as the user modifies the position and orientation of his/her head, the foregoing operations must be carried out quite frequently, preferable greater than 10 cycle per second.

Additionally, a preferred embodiment of the invention includes the modification of the binaural audio file reproduction volume depending on the distance between the user and the object. This characteristic can be used to increase the realism of the information provided to the users to attract their attention. In this manner, for example, the volume can diminish as the user draws away form the object and disappear altogether when they leave the region of the enclosure associated to said object.

Another specific embodiment of the invention includes selecting, based on the reproduction rules, the binaural audio file that will be reproduced through the headphones form among a group of binaural audio files associated to an object. For example, a sub-region could be defined within the region associated to an object, in such a manner that when the user enters the region, a first binaural audio file is firstly reproduced with atmosphere sound to draw the attention to the user. If the user advances and draws nearer to the object, on entering the sub-region an second binaural audio file will be reproduced, for example an introductory track with information about the object. In another specific example, the reproduction of the second binaural audio file could take place if the user remains within the region associated to the object for a certain time interval. In yet another preferred embodiment, the user can request the reproduction of binaural audio files using the user interface, for example by pressing a button.

Additionally, another advantage of the system and procedure of the present invention is that it allows the reproduction of audio help files for the blind, in such a manner that these can move with greater freedom around the museum.

These audio help files could be locutions informing about the size of the room, the distance between the user and the nearest obstacle in a certain direction, the location of entries and exits, etc. For example, if the blind user wishes to move to the following room, a locution with the word "exit" that seems to come from the location of the exit could be reproduced, in such a manner that the blind user would only have to move towards the place from which the sound comes. Furthermore, a complementary function is available to adapt the volume of the audio files reproduction: an audiometry system is also included in order to assist hearing disabled persons. The system can thus be adapted to the hearing level of each user.

DESCRIPTION OF THE DRAWINGS

For the purpose of complementing this description and helping to better understand the characteristics of the invention, a set of drawings in accordance with a preferred practical embodiment thereof has been included as an integral part of this description, in which the following figures have been represented in an illustrative and unlimitative manner:

Figure 1 shows a block drawing of the audio guide of the invention. Figure 2 shows a perspective view of the upper module of the audio guide of the invention.

Figure 3 shows a perspective view of the console of the audio guide of the invention.

Figure 4 shows a user who is using the audio guide of the present invention.

Figure 5 shows a plan view of a museum where the regions and sub- regions associated to each table can be observed. PREFERRED EMBODIMENT OF THE INVENTION

Below we describe a preferred embodiment of the audio guide (1 ) of the invention making reference to the figures attached hereto. This audio guide (1 ) has immediate application, especially in museums, and substitutes the traditional audio guides considerably improving user experience (20).

The main technological challenges that the development of this audio guide (1 ) faces are, firstly, obtain accurate localization indoors without modifying the surroundings and, secondly, ensure that the volume, weight and energy consumption of the audio guide (1 ) are sufficiently low for it to be really portable.

As can be observed in figures 1 , 2 and 3, the audio guide (1 ) is comprised by two differentiated units. On one hand there is the upper module (2) that accompanies the user (20) throughout the visit, and which comprises light and ergonomic headphones (4) and, in this example, two image acquisition means (5,

6). The second unit, or console (3), linked to the previous one, is fundamentally an electronic device having sufficient capacity to process the images and generate the binaural audio. These components are described below. a) Upper module (2) The upper module is comprised of two main components, the image acquisition means (5, 6) and the headphones (4). The transmission of the digital video signals coming form the image acquisition means (5, 6) towards the console (2) is carried out via USB connection through screened interlaced wiring (7), that runs parallel to the audio signal cable (8). Image acquisition means (5, 6)

In this example, the image acquisition means (5, 6) are two CMOS cameras, although in other embodiments of the invention the images of a single camera would be enough to carry out MonoSLAM processing. Based on the two signals obtained, and through MonoSLAM processing, the absolute position and angular orientation of the user's head (20) within a two-dimensional map of the enclosure. The requirements that must be fulfilled by the image acquisition means (5, 6) for these position and orientation coordinates to be correct comprise the use of VGA or higher-resolution cameras, with a wide-angle lens. Additionally, to achieve the necessary position updating frequency, the frame rate of the image acquisition means (5, 6) is at least 30 fps (frames per second).

Moving object hinder normal MonoSLAM operation. To prevent the image of the image acquisition means (5, 6) is comprised by moving objects (such as other museum visitors), the image acquisition means or camera can be disposed on the upper part of the upper module (2) at a 45^Q angle with the horizontal.

Alternatively, it can be disposed at a side of the device (in the headphone), since it has been proved that certain type of hair can make the camera vision difficult. Additionally, the image acquisition means (5, 6) are covered with a transparent dome (9) to avoid the deposition of dust particles on the lenses and facilitate cleaning of the optical zone.

Image Acquisition Means (5, 6) Technical Characteristics:

• Sensor: BGA CMOS

• Maximum lens aperture: F:2.4

• Lens angle of vision: 120 degrees • Data format: YUY2

• Colour depth: 24 bits

• Interface: USB 2.0 Video Class: UVC

• Power supply: Via USB<1 W Headphones (4) Given that the acoustic contamination of the real atmosphere sound is considered a problem, we decided to find a headphone (4) coupling system that will insulate against the entrance of external noise to a greater extent. Among the existing alternatives to implement the headphones (4), intra-auricular devices have been ruled out for hygiene reasons, and those that encapsulate the ear for being too bulky and compromise transpiration.

The technical characteristics of the headphones (4) used in this example are set out below, although other alternatives that can also be functionally correct are not ruled out: • Response frequency: 12-22000 Hz

• Impedance: 24 ohms

• Magnet type: Neodimio

• Maximum incoming power: 100 mW

• Sensitivity; 106 dB • Injected back ABS casing with lacquered ultra-polished finish

• Spherical articulation in both transductors

• Cable outlet through a single headphone 8arc with a hollow for cables)

• Disposable covers

• Upper transparent polycarbonate dome b) Console (3)

The console (3) is divided into four functional blocks:

• Localisation and orientation means (10)

• Sonorous synthesis means 81 1 )

• Storage means (12) • Interface medium (13)

• Battery (14)

Localisation and orientation means (10)

The localisation and orientation means (10) is a device which, by using MonoSLAM-type algorithms, obtains the absolute position and orientation of the user's head (20) within an enclosure, such as the rooms of a museum, using the two image acquisition means (5, 6) disposed on the upper module (2) as a sensor. The position obtained consists of the absolute x, y, z parameters and the tilt and orientation angle of the upper module (2), information that is transmitted to a sonorous synthesis means (1 1 ) with a frequency of 30 times per second through a serial communications port 8SPI, 12C, RS232,...).

The MonoSLAM algorithms have a fairly high processing load. To carry out the necessary functions with ease, in the present example the localisation and orientation means (10) is an Intel Core 2 Duo 2.4 Ghz processor, although it is possible to use other solutions based on low-consumption processors such as Intel Atom or ARM.

Sonorous synthesis means (11)

The sonorous synthesis means (1 1 ) is DSP by Texas Instruments of the

C600 family having a storage means (12) or 4 GB Secure Digital memory in which all the audio tracks are stored in a single channel compressed into mp3 (this implies more than 70 hours of information). That is, in this example, the sonorous synthesis means (1 1 ) and the storage means (12) are implemented in a single device.

The sonorous synthesis means (1 1 ) collects the information from the localisation and orientation means and, based on this information and the reproduction rules, selects and generates the binaural audio files to be reproduced.

At certain times, the localisation and orientation means (10) can show undetermination when giving the position due, for example, to abrupt user movements (20). It is possible to implement a complementary module based on accelerometers and gyroscopes to compensate these deficiencies. The selected audio files are processed according to the user's relative position (20) with respect to the objects (17), in this case paintings, and time, obtaining a single stereo track. This processing will depend on the relative distance to the object (17), in addition to the reception angle. The stereo tracks are processed with enclosure-modelling algorithms (reverberation, echoes, ...) to give the experience greater realism. Once all the tracks have been independently processed, they are introduced in a mixer that generates a stereo audio track. Next, this stereo audio track is processed through an outgoing codec and an audio amplification system, giving rise to the signal that is reproduced in the headphones (4). The maximum total consumption of the sonorous synthesis means (1 1 ) is 2W. Battery (14).

The audio guide (1 ) of the invention will be powered by a Li-Po four-cell battery (14) that is located within the console (130 x 75 x 27.5mm). Therefore, the space available for the battery (14) could be 124 x 61 x 12 mm with a capacity of 2.5 ah and a nominal voltage of 14.8 V. With this capacity, to obtain an audio guide (1 ) to be used continuously, thereby eliminating battery (14) recharges times. The batteries (14) will be recharged at a station with a capacity, to obtain an audio guide (1 ) with four hours of autonomy throughout its useful life, the average consumption of the audio guide (1 ) should be less than 5W.

The console (3) has a removable lid that gives access to the battery (14) and an SD card. The battery (14) is charged by extracting the console (3), which allows the audio guide (1 ) to be used continuously, thereby eliminating battery (14) recharge times. The batteries (14) will be recharged at a station with a capacity for several units and the recharge time must be less than that of a user's visit (20) to the museum.

Interface (13)

The audio guide (1 ) of this example also includes a user interface (13) that comprises a monochrome graphic LCD, six push buttons and a switch. Additionally, a sliding switch is disposed on the upper part of the console (3) that blocks the operation of the rest.

Operation

Below we describe the operation of the audio guide (1 ) of the invention. Fig. 4 represents a user (20) who has contracted the use of the audio guide (1 ) of the invention at the start of a visit to a museum. Firstly, and specifically in the case of museums, the MonoSLAM system must be started up so that it recognises the reference points of the different rooms This start-up or initialization can be done "by catalogue", that is to say, it is enough to have a catalogue of the pictures in order to train the system without requiring a physical visit to the museum. In other words, the start-up or initialization can be done remotely and not necessarily close in time to the actual use of the audio guide. This operation is done once, for the different pictures or object of the exposition of the museum. The reproduction rules associated to each object (17) or work of art must also be defined, normally using a specific software or through a computer or PC.

After it has been started up, the audio guide (1 ) of the invention is ready to operate: as the user (20) moves through each room of the museum, they enter and exit the regions (15) and sub-regions (16) associated to each object (17), which are represented in Fig. 5 and, depending on the reproduction rules associated to each object (17), the corresponding information is reproduced through their headphones (2). In this example, when the user (20) enters the region (15) associated to the object (17), a fist type of binaural audio file consisting of atmosphere sound related to the object (17) in question is reproduced. This sound is reproduced when the user (20) is still close to the region (15) limits, and attempts to draw the user's attention. For example, if the object (17) is the painting by Goya "Shootings at Principe Pfo", the atmosphere sounds could be the shots and cries of the persons who appear in the painting.

Next, when the user (10) approaches the work of art and is located within a sub-region (16) associated to the object (17), a binaural audio file is reproduced with introductory information. This track will have relevant information about the object, but it will not be very long so as not to tire the user (20), and can contain music, dialogues or atmosphere sounds to improve the experience.

Finally, once the introductory track has finished, additional information can be reproduced if the user (10) requests it via the console (3) interface medium (13), for example by pressing a button.

If during the reproduction of any of these tracks, the user (20) moves away from the object (17) towards another location, the reproduction volume will gradually diminish until disappearing altogether when the user (20) leaves the region (15) associated to the object (17).

Claims

1.- Binaural audio guide (1 ) to provide information to a user (20) relative to objects (17) located within an enclosure, characterised in that each object (17) is associated to at least one audio file and reproduction rules, which comprises:

a)an upper module (2) that can be coupled to the user's (20) head, which comprises headphones (4) and at least one image acquisition means (5, 6) which acquires images from the user's (20) surroundings; and b)a console (3) connected to said upper module (2), which comprises:

- a localisation and orientation means (10) which determines, according to a simultaneous localisation and mapping process, the position and orientation of the user's (20) head based on the images acquired by the at least one image acquisition means (5, 6); - a storage means (12) that stores the audio files associated to the objects (17); and

- a sonorous synthesis means that generates, according to the position and orientation of the user's (20) head, time and the reproduction rules associated to the objects (17), a binaural audio file based on at least one audio file associated to each of said objects (17), in such a manner that, in the binaural audio file generated, the sound corresponding to each of the objects (17) seems to come from said specific object (17).

2.- Binaural audio guide (1 ), according to claim 1 , characterised in that the at least one image acquisition means (5, 6) is oriented in a front-upper direction.

3.- Binaural audio guide (1 ), according to any of preceding claims, characterised in that at least one image acquisition means (5, 6) is located inside a transparent dome (9) disposed in the upper module (2).

4.- Binaural audio guide (1 ), according to either claim 1 or 2, characterised in that at least one image acquisition means (5, 6) is located in a headphone (4).

5.- Binaural audio guide (1 ), according to any of the preceding claims, characterised in that it comprises two image acquisition means (5, 6).

6.- Binaural audio guide(1 ), according to any of the preceding claims, characterised in that it also comprises a localisation and orientation means based on accelerometers and gyroscopes.

7.- Binaural audio guide (1 ), according to any of the preceding claims, characterised in that the localisation and orientation means (10), the sonorous synthesis means (1 1 ) and the storage means (12) are implemented by means of at least one of the devices of the following list: a micro-controller, a micro-processor, a DSP, a FPGA and an ASIC.

8.- Binaural audio guide (1 ), according to any of the preceding claims, characterised in that it also comprises a battery 814).

9.- Binaural audio guide (1 ), according to any of the preceding claims, characterised in that it also comprises a user interface (13).

10.- Binaural audio guide (1 ), according to any of the preceding claims, characterised in that it also comprises an audiometry system for assisting hearing disabled persons by adapting the volume to the hearing level of the users.

1 1.- Binaural audio guide (1 ), according to any of the preceding claims, characterised in that it also comprises a module which provides a contextualized voice, positioned at a distance of several meters from the user, for assisting visually-disabled or blind persons.

12.- Binaural audio guide (1 ), according to any of the preceding claims, wherein said localization and orientation means (10) are configured to determine the position and orientation of the user's head without the need for installing information points.

13.- Operating procedure of a binaural audio guide (1 ) for providing a user (20) with information about objects (17) located within an enclosure, characterised in that each object (17) is associated to at least one audio file and reproduction rules, and which comprises the following operations:

- acquire, through at least one image acquisition means (5, 6) coupled to a user's (20) head, images of said user's (20) surroundings;

- process the images acquired by a localisation and orientation means (10) based on a simultaneous localisation and mapping process to obtain the position and orientation of the user's (20) head;

- determine, depending on the position and orientation of the user's (20) head and time, if any of the reproduction rules corresponding to any of the objects (17) are fulfilled; - if so generate, through a sonorous synthesis module (1 1 ), and according to the position and orientation of the user's (20) head, time and the reproduction rules of said objects (17), a binaural audio file based on at least one audio file associated to each object (17), in such a manner that the user's feeling on hearing the binaural audio file generated is that the sound of each object (17) comes from said specific object (17); and

- reproduce the binaural audio file generated through headphones worn by the user (20).

14.- Operating procedure of a binaural audio guide (1 ), according to claim 13, characterised in that the preceding operations are carried out with a frequency of above 10 cycles per second.

15.- Operating procedure of a binaural audio guide (1 ), according to any of the claims 13 to 14, characterised in that the reproduction rules comprise a grammar of spatial reproduction rules and temporal reproduction rules.

16.- Operating procedure of a binaural audio guide (1 ), according to any of the claims 13 to 15, characterised in that it comprises the step of reproducing a first type of binaural audio file when the user (20) enters the region (15) associated to an object (17).

17.- Operating procedure of a binaural audio guide (1 ), according to claim 16, characterised in that it comprises the step of reproducing a second type of binaural audio file when the user (20) enters a sub-region (16) located inside the region (15).

18.- Operating procedure of a binaural audio guide (1 ), according to claim 16, characterised in that it comprises the step of reproducing a second binaural audio file when the user (20) remains for a certain time interval within the region (15).

19.- Operating procedure of a binaural audio guide (1 ), according to any of the claims 13 to 18, characterised in that it also comprises the additional operation of modifying the binaural audio file reproduction volume based on the distance between the user (20) and the object (17).

20.- Operating procedure of a binaural audio guide (1 ), according to any of the claims 13 to 19, characterised in that it comprises the step of initializing said binaural audio guide (1 ).

21.- Operating procedure of a binaural audio guide (1 ), according to claim 20, characterised in that said binaural audio guide (1 ) is initialized by catalogue.